scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Multimedia in 2006"


Journal ArticleDOI
TL;DR: This paper addresses the problem of streaming packetized media over a lossy packet network in a rate-distortion optimized way, and derives a fast practical algorithm for nearly optimal streaming and a general purpose iterative descent algorithm for locally optimal streaming in arbitrary scenarios.
Abstract: This paper addresses the problem of streaming packetized media over a lossy packet network in a rate-distortion optimized way. We show that although the data units in a media presentation generally depend on each other according to a directed acyclic graph, the problem of rate-distortion optimized streaming of an entire presentation can be reduced to the problem of error-cost optimized transmission of an isolated data unit. We show how to solve the latter problem in a variety of scenarios, including the important common scenario of sender-driven streaming with feedback over a best-effort network, which we couch in the framework of Markov decision processes. We derive a fast practical algorithm for nearly optimal streaming in this scenario, and we derive a general purpose iterative descent algorithm for locally optimal streaming in arbitrary scenarios. Experimental results show that systems based on our algorithms have steady-state gains of 2-6 dB or more over systems that are not rate-distortion optimized. Furthermore, our systems essentially achieve the best possible performance: the operational distortion-rate function of the source at the capacity of the packet erasure channel.

736 citations


Journal ArticleDOI
TL;DR: Novel methods to evaluate the performance of object detection algorithms in video sequences are proposed and segmentation algorithms recently proposed are evaluated in order to assess how well they can detect moving regions in an outdoor scene in fixed-camera situations.
Abstract: In this paper, we propose novel methods to evaluate the performance of object detection algorithms in video sequences. This procedure allows us to highlight characteristics (e.g., region splitting or merging) which are specific of the method being used. The proposed framework compares the output of the algorithm with the ground truth and measures the differences according to objective metrics. In this way it is possible to perform a fair comparison among different methods, evaluating their strengths and weaknesses and allowing the user to perform a reliable choice of the best method for a specific application. We apply this methodology to segmentation algorithms recently proposed and describe their performance. These methods were evaluated in order to assess how well they can detect moving regions in an outdoor scene in fixed-camera situations

266 citations


Journal ArticleDOI
TL;DR: A spatio-temporal approach in recognizing six universal facial expressions from visual data and using them to compute levels of interest was presented and was found to be consistent with "ground truth" information in most of the cases.
Abstract: This paper presents a spatio-temporal approach in recognizing six universal facial expressions from visual data and using them to compute levels of interest. The classification approach relies on a two-step strategy on the top of projected facial motion vectors obtained from video sequences of facial expressions. First a linear classification bank was applied on projected optical flow vectors and decisions made by the linear classifiers were coalesced to produce a characteristic signature for each universal facial expression. The signatures thus computed from the training data set were used to train discrete hidden Markov models (HMMs) to learn the underlying model for each facial expression. The performances of the proposed facial expressions recognition were computed using five fold cross-validation on Cohn-Kanade facial expressions database consisting of 488 video sequences that includes 97 subjects. The proposed approach achieved an average recognition rate of 90.9% on Cohn-Kanade facial expressions database. Recognized facial expressions were mapped to levels of interest using the affect space and the intensity of motion around apex frame. Computed level of interest was subjectively analyzed and was found to be consistent with "ground truth" information in most of the cases. To further illustrate the efficacy of the proposed approach, and also to better understand the effects of a number of factors that are detrimental to the facial expression recognition, a number of experiments were conducted. The first empirical analysis was conducted on a database consisting of 108 facial expressions collected from TV broadcasts and labeled by human coders for subsequent analysis. The second experiment (emotion elicitation) was conducted on facial expressions obtained from 21 subjects by showing the subjects six different movies clips chosen in a manner to arouse spontaneous emotional reactions that would produce natural facial expressions.

246 citations


Journal ArticleDOI
TL;DR: Subjective and objective tests reveal that the proposed watermarking scheme maintains high audio quality and is simultaneously highly robust to pirate attacks, including MP3 compression, low-pass filtering, amplitude scaling, time scaling, digital-to-analog/analog- to-digital reacquisition, cropping, sampling rate change, and bit resolution transformation.
Abstract: This work proposes a method of embedding digital watermarks into audio signals in the time domain. The proposed algorithm exploits differential average-of-absolute-amplitude relations within each group of audio samples to represent one-bit information. The principle of low-frequency amplitude modification is employed to scale amplitudes in a group manner (unlike the sample-by-sample manner as used in pseudonoise or spread-spectrum techniques) in selected sections of samples so that the time-domain waveform envelope can be almost preserved. Besides, when the frequency-domain characteristics of the watermark signal are controlled by applying absolute hearing thresholds in the psychoacoustic model, the distortion associated with watermarking is hardly perceivable by human ears. The watermark can be blindly extracted without knowledge of the original signal. Subjective and objective tests reveal that the proposed watermarking scheme maintains high audio quality and is simultaneously highly robust to pirate attacks, including MP3 compression, low-pass filtering, amplitude scaling, time scaling, digital-to-analog/analog-to-digital reacquisition, cropping, sampling rate change, and bit resolution transformation. Security of embedded watermarks is enhanced by adopting unequal section lengths determined by a secret key.

235 citations


Journal ArticleDOI
TL;DR: A randomized arithmetic coding paradigm is introduced, which achieves encryption by inserting some randomization in the arithmetic coding procedure, and unlike previous works on encryption by arithmetic coding, this is done at no expense in terms of coding efficiency.
Abstract: We propose a novel multimedia security framework based on a modification of the arithmetic coder, which is used by most international image and video coding standards as entropy coding stage. In particular, we introduce a randomized arithmetic coding paradigm, which achieves encryption by inserting some randomization in the arithmetic coding procedure; notably, and unlike previous works on encryption by arithmetic coding, this is done at no expense in terms of coding efficiency. The proposed technique can be applied to any multimedia coder employing arithmetic coding; in this paper we describe an implementation tailored to the JPEG 2000 standard. The proposed approach turns out to be robust towards attempts to estimating the image or discovering the key, and allows very flexible protection procedures at the code-block level, allowing to perform total and selective encryption, as well as conditional access

230 citations


Journal ArticleDOI
TL;DR: It is found that these hash functions are resistant to signal processing and transmission impairments, and therefore can be instrumental in building database search, broadcast monitoring and watermarking applications for video.
Abstract: Identification and verification of a video clip via its fingerprint find applications in video browsing, database search and security. For this purpose, the video sequence must be collapsed into a short fingerprint using a robust hash function based on signal processing operations. We propose two robust hash algorithms for video based both on the discrete cosine transform (DCT), one on the classical basis set and the other on a novel randomized basis set (RBT). The robustness and randomness properties of the proposed hash functions are investigated in detail. It is found that these hash functions are resistant to signal processing and transmission impairments, and therefore can be instrumental in building database search, broadcast monitoring and watermarking applications for video. The DCT hash is more robust, but lacks security aspect, as it is easy to find different video clips with the same hash value. The RBT based hash, being secret key based, does not allow this and is more secure at the cost of a slight loss in the receiver operating curves

217 citations


Journal ArticleDOI
TL;DR: Experimental results are reported on a real-world image collection to demonstrate that the proposed methods outperform the traditional kernel BDA (KBDA) and the support vector machine (SVM) based RF algorithms.
Abstract: In recent years, a variety of relevance feedback (RF) schemes have been developed to improve the performance of content-based image retrieval (CBIR). Given user feedback information, the key to a RF scheme is how to select a subset of image features to construct a suitable dissimilarity measure. Among various RF schemes, biased discriminant analysis (BDA) based RF is one of the most promising. It is based on the observation that all positive samples are alike, while in general each negative sample is negative in its own way. However, to use BDA, the small sample size (SSS) problem is a big challenge, as users tend to give a small number of feedback samples. To explore solutions to this issue, this paper proposes a direct kernel BDA (DKBDA), which is less sensitive to SSS. An incremental DKBDA (IDKBDA) is also developed to speed up the analysis. Experimental results are reported on a real-world image collection to demonstrate that the proposed methods outperform the traditional kernel BDA (KBDA) and the support vector machine (SVM) based RF algorithms

188 citations


Journal ArticleDOI
TL;DR: A two-layer hidden Markov model (HMM) framework that implements such concept in a principled manner, and that has advantages over previous works, is proposed that is easier to interpret and easier to improve.
Abstract: We address the problem of recognizing sequences of human interaction patterns in meetings, with the goal of structuring them in semantic terms. The investigated patterns are inherently group-based (defined by the individual activities of meeting participants, and their interplay), and multimodal (as captured by cameras and microphones). By defining a proper set of individual actions, group actions can be modeled as a two-layer process, one that models basic individual activities from low-level audio-visual (AV) features,and another one that models the interactions. We propose a two-layer hidden Markov model (HMM) framework that implements such concept in a principled manner, and that has advantages over previous works. First, by decomposing the problem hierarchically, learning is performed on low-dimensional observation spaces, which results in simpler models. Second, our framework is easier to interpret, as both individual and group actions have a clear meaning, and thus easier to improve. Third, different HMMs can be used in each layer, to better reflect the nature of each subproblem. Our framework is general and extensible, and we illustrate it with a set of eight group actions, using a public 5-hour meeting corpus. Experiments and comparison with a single-layer HMM baseline system show its validity.

183 citations


Journal ArticleDOI
TL;DR: This paper introduces Daubechies Wavelet Coefficient Histograms (DWCH) for music feature extraction for music information retrieval and conducts a proof-of-concept experiment on similarity search using the feature set.
Abstract: Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of intelligent music information retrieval. Huron points out that since the preeminent functions of music are social and psychological, the most useful characterization would be based on four types of information: genre, emotion, style,and similarity. This paper introduces Daubechies Wavelet Coefficient Histograms (DWCH)for music feature extraction for music information retrieval. The histograms are computed from the coefficients of the db/sub 8/ Daubechies wavelet filter applied to 3 s of music. A comparative study of sound features and classification algorithms on a dataset compiled by Tzanetakis shows that combining DWCH with timbral features (MFCC and FFT), with the use of multiclass extensions of support vector machine,achieves approximately 80% of accuracy, which is a significant improvement over the previously known result on this dataset. On another dataset the combination achieves 75% of accuracy. The paper also studies the issue of detecting emotion in music. Rating of two subjects in the three bipolar adjective pairs are used. The accuracy of around 70% was achieved in predicting emotional labeling in these adjective pairs. The paper also studies the problem of identifying groups of artists based on their lyrics and sound using a semi-supervised classification algorithm. Identification of artist groups based on the Similar Artist lists at All Music Guide is attempted. The semi-supervised learning algorithm resulted in nontrivial increases in the accuracy to more than 70%. Finally, the paper conducts a proof-of-concept experiment on similarity search using the feature set.

169 citations


Journal ArticleDOI
TL;DR: A new methodology for developing perceptually accurate models for nonintrusive prediction of voice quality which avoids time-consuming subjective tests is presented and is generic and has wide applicability in multimedia applications.
Abstract: The primary aim of this paper is to present new models for objective, nonintrusive, prediction of voice quality for IP networks and to illustrate their application to voice quality monitoring and playout buffer control in VoIP networks. The contributions of the paper are threefold. First, we present a new methodology for developing perceptually accurate models for nonintrusive prediction of voice quality which avoids time-consuming subjective tests. The methodology is generic and as such it has wide applicability in multimedia applications. Second, based on the new methodology, we present efficient regression models for predicting conversational voice quality nonintrusively for four modern codecs (G.729, G.723.1, AMR and iLBC). Third, we illustrate the usefulness of the models in two main applications - voice quality prediction for real Internet VoIP traces and perceived quality-driven playout buffer optimization. For voice quality prediction, the results show that the models have accuracy close to the combined ITU PESQ/E-model method using real Internet traces (correlation coefficient over 0.98). For playout buffer optimization, the proposed buffer algorithm provides an optimum voice quality when compared to five other buffer algorithms for all the traces considered

169 citations


Journal ArticleDOI
TL;DR: This work designs models using the results of a subjective test based on 1080 packet losses in 72 minutes of video, and develops three methods, which differ in the amount of information available to them.
Abstract: We consider the problem of predicting packet loss visibility in MPEG-2 video. We use two modeling approaches: CART and GLM. The former classifies each packet loss as visible or not; the latter predicts the probability that a packet loss is visible. For each modeling approach, we develop three methods, which differ in the amount of information available to them. A reduced reference method has access to limited information based on the video at the encoder's side and has access to the video at the decoder's side. A no-reference pixel-based method has access to the video at the decoder's side but lacks access to information at the encoder's side. A no-reference bitstream-based method does not have access to the decoded video either; it has access only to the compressed video bitstream, potentially affected by packet losses. We design our models using the results of a subjective test based on 1080 packet losses in 72 minutes of video.

Journal ArticleDOI
TL;DR: A semantic analysis system based on Bayesian network (BN) and dynamic Bayesiannetwork (DBN) that can identify the special events in soccer games such as goal event, corner kick event, penaltyKick event, and card event is introduced.
Abstract: Video semantic analysis is formulated based on the low-level image features and the high-level knowledge which is encoded in abstract, nongeometric representations. This paper introduces a semantic analysis system based on Bayesian network (BN) and dynamic Bayesian network (DBN). It is validated in the particular domain of soccer game videos. Based on BN/DBN, it can identify the special events in soccer games such as goal event, corner kick event, penalty kick event, and card event. The video analyzer extracts the low-level evidences, whereas the semantic analyzer uses BN/DBN to interpret the high-level semantics. Different from previous shot-based semantic analysis approaches, the proposed semantic analysis is frame-based for each input frame, it provides the current semantics of the event nodes as well as the hidden nodes. Another contribution is that the BN and DBN are automatically generated by the training process instead of determined by ad hoc. The last contribution is that we introduce a so-called temporal intervening network to improve the accuracy of the semantics output

Journal ArticleDOI
TL;DR: A novel content-dependent localized robust audio watermarking scheme that shows strong robustness against common audio signal processing, time-domain synchronization attacks, and most distortions introduced in Stirmark for Audio.
Abstract: Synchronization attacks like random cropping and time-scale modification are very challenging problems to audio watermarking techniques. To combat these attacks, a novel content-dependent localized robust audio watermarking scheme is proposed. The basic idea is to first select steady high-energy local regions that represent music edges like note attacks, transitions or drum sounds by using different methods, then embed the watermark in these regions. Such regions are of great importance to the understanding of music and will not be changed much for maintaining high auditory quality. In this way, the embedded watermark has the potential to escape all kinds of distortions. Experimental results show strong robustness against common audio signal processing, time-domain synchronization attacks, and most distortions introduced in Stirmark for Audio.

Journal ArticleDOI
TL;DR: Through simulation and wide-area measurement studies, it is verified that the proposed incentive mechanism can provide near optimal streaming quality to the cooperative users until the bottleneck shifts from the streaming sources to the network.
Abstract: We propose a service differentiated peer selection mechanism for peer-to-peer media streaming systems. The mechanism provides flexibility and choice in peer selection to the contributors of the system, resulting in high quality streaming sessions. Free-riders are given limited options in peer selection,if any, and hence receive low quality streaming. The proposed incentive mechanism follows the characteristics of rank-order tournaments theory that considers only the relative performance of the players, and the top prizes are awarded to the winners of the tournament. Using rank-order tournaments, we analyze the behavior of utility maximizing users. Through simulation and wide-area measurement studies, we verify that the proposed incentive mechanism can provide near optimal streaming quality to the cooperative users until the bottleneck shifts from the streaming sources to the network.

Journal ArticleDOI
Xiaohui Gu1, Klara Nahrstedt
TL;DR: This paper presents a fully decentralized service composition framework, called SpiderNet, to address the challenges of distributed multimedia service composition, and provides statistical multiconstrained QoS assurances and load balancing for service composition.
Abstract: Service composition allows multimedia services to be automatically composed from atomic service components based on dynamic service requirements. Previous work falls short for distributed multimedia service composition in terms of scalability, flexibility and quality-of-service (QoS) management. In this paper, we present a fully decentralized service composition framework, called SpiderNet, to address the challenges. SpiderNet provides statistical multiconstrained QoS assurances and load balancing for service composition. Moreover, SpiderNet supports directed acyclic graph composition topologies and exchangeable composition orders. We have implemented a prototype of SpiderNet and conducted experiments on both wide-area networks and a simulation testbed. Our experimental results show the feasibility and efficiency of the SpiderNet service composition framework.

Journal ArticleDOI
TL;DR: Two new semi-fragile authentication techniques robust against lossy compression are proposed, using random bias and nonuniform quantization, to improve the performance of the methods proposed by Lin and Chang.
Abstract: Semi-fragile watermarking techniques aim at detecting malicious manipulations on an image, while allowing acceptable manipulations such as lossy compression. Although both of these manipulations are considered to be pixel value changes, semi-fragile watermarks should be sensitive to malicious manipulations but robust to the degradation introduced by lossy compression and other defined acceptable manipulations. In this paper, after studying the characteristics of both natural images and malicious manipulations, we propose two new semi-fragile authentication techniques robust against lossy compression, using random bias and nonuniform quantization, to improve the performance of the methods proposed by Lin and Chang.

Journal ArticleDOI
TL;DR: A general framework for temporal scene segmentation in various video domains that is able to find the weak boundaries as well as the strong boundaries, i.e., it does not rely on the fixed threshold and can be applied to different video domains.
Abstract: Videos are composed of many shots that are caused by different camera operations, e.g., on/off operations and switching between cameras. One important goal in video analysis is to group the shots into temporal scenes, such that all the shots in a single scene are related to the same subject, which could be a particular physical setting, an ongoing action or a theme. In this paper, we present a general framework for temporal scene segmentation in various video domains. The proposed method is formulated in a statistical fashion and uses the Markov chain Monte Carlo (MCMC) technique to determine the boundaries between video scenes. In this approach, a set of arbitrary scene boundaries are initialized at random locations and are automatically updated using two types of updates: diffusion and jumps. Diffusion is the process of updating the boundaries between adjacent scenes. Jumps consist of two reversible operations: the merging of two scenes and the splitting of an existing scene. The posterior probability of the target distribution of the number of scenes and their corresponding boundary locations is computed based on the model priors and the data likelihood. The updates of the model parameters are controlled by the hypothesis ratio test in the MCMC process, and the samples are collected to generate the final scene boundaries. The major advantage of the proposed framework is two-fold: 1) it is able to find the weak boundaries as well as the strong boundaries, i.e., it does not rely on the fixed threshold; 2) it can be applied to different video domains. We have tested the proposed method on two video domains: home videos and feature films, and accurate results have been obtained

Journal ArticleDOI
TL;DR: An adaptive memoryless protocol is presented, which is an improvement on the query tree protocol, and causes fewer collisions and takes shorter delay for recognizing all tags while preserving lower communication overhead than other tree based tag anticollision protocols.
Abstract: A radio frequency identification (RFID) reader recognizes objects through wireless communications with RFID tags. Tag collision arbitration for passive tags is a significant issue for fast tag identification due to communication over a shared wireless channel. This paper presents an adaptive memoryless protocol, which is an improvement on the query tree protocol. Memoryless means that tags need not have additional memory except ID for identification. To reduce collisions and identify tags promptly, we use information obtained from the last process of tag identification at a reader. Our performance evaluation shows that the adaptive memoryless protocol causes fewer collisions and takes shorter delay for recognizing all tags while preserving lower communication overhead than other tree based tag anticollision protocols

Journal ArticleDOI
TL;DR: A video bit allocation technique adopting a visual distortion sensitivity model for better rate-visual distortion coding control is proposed in this paper, and can be incorporated into existing video coding rate control schemes to achieve same visual quality at reduced bitrate.
Abstract: A video bit allocation technique adopting a visual distortion sensitivity model for better rate-visual distortion coding control is proposed in this paper. Instead of applying complicated semantic understanding, the proposed automatic distortion sensitivity analysis process analyzes both the motion and the texture structures in the video sequences in order to achieve better bit allocation for rate-constrained video coding. The proposed technique evaluates the perceptual distortion sensitivity on a macroblock basis, and allocates fewer bits to regions permitting large perceptual distortions for rate reduction. The proposed algorithm can be incorporated into existing video coding rate control schemes to achieve same visual quality at reduced bitrate. Experiments based on H.264 JM7.6 show that this technique achieves bit-rate saving of up to 40.61%. However, the conducted subjective viewing experiments show that there is no perceptual quality degradation. EDICS-1-CPRS, 3-QUAL.

Journal ArticleDOI
TL;DR: A method to automatically generate video summaries using transcripts obtained by automatic speech recognition by dividing the full program into segments based on pause detection and derive a score for each segment, based on the frequencies of the words and bigrams it contains.
Abstract: Compact representations of video data greatly enhances efficient video browsing. Such representations provide the user with information about the content of the particular sequence being examined while preserving the essential message. We propose a method to automatically generate video summaries using transcripts obtained by automatic speech recognition. We divide the full program into segments based on pause detection and derive a score for each segment, based on the frequencies of the words and bigrams it contains. Then, a summary is generated by selecting the segments with the highest score to duration ratios while at the same time maximizing the coverage of the summary over the full program. We developed an experimental design and a user study to judge the quality of the generated video summaries. We compared the informativeness of the proposed algorithm with two other algorithms for three different programs. The results of the user study demonstrate that the proposed algorithm produces more informative summaries than the other two algorithms

Journal ArticleDOI
TL;DR: An optimization framework is proposed, which enables the multiple senders to coordinate their packet transmission schedules, such that the average quality over all video clients is maximized, and is very efficient in terms of video quality.
Abstract: We consider the problem of distributed packet selection and scheduling for multiple video streams sharing a communication channel. An optimization framework is proposed, which enables the multiple senders to coordinate their packet transmission schedules, such that the average quality over all video clients is maximized. The framework relies on rate-distortion information that is used to characterize a video packet. This information consists of two quantities: the size of the packet in bits, and its importance for the reconstruction quality of the corresponding stream. A distributed streaming strategy then allows for trading off rate and distortion, not only within a single video stream, but also across different streams. Each of the senders allocates to its own video packets a share of the available bandwidth on the channel in proportion to their importance. We evaluate the performance of the distributed packet scheduling algorithm for two canonical problems in streaming media, namely adaptation to available bandwidth and adaptation to packet loss through prioritized packet retransmissions. Simulation results demonstrate that, for the difficult case of scheduling nonscalably encoded video streams, our framework is very efficient in terms of video quality, both over all streams jointly and also over the individual videos. Compared to a conventional streaming system that does not consider the relative importance of the video packets, the gains in performance range up to 6 dB for the scenario of bandwidth adaptation, and even up to 10 dB for the scenario of random packet loss adaptation.

Journal ArticleDOI
TL;DR: A two-phase trajectory-based detection and tracking algorithm for locating the ball in broadcast soccer video (BSV) that achieves a high accuracy of about 81% and is able to reliably detect partially occluded or merged balls in the sequence.
Abstract: This paper presents a novel trajectory-based detection and tracking algorithm for locating the ball in broadcast soccer video (BSV). The problem of ball detection and tracking in BSV is well known to be very challenging because of the wide variation in the appearance of the ball over frames. Direct detection algorithms do not work well because the image of the ball may be distorted due to the high speed of the ball, occlusion, or merging with other objects in the frame. To overcome these challenges, we propose a two-phase trajectory-based algorithm in which we first generate a set of ball-candidates for each frame, and then use them to compute the set of ball trajectories. Informally, the two key ideas behind our strategy are 1) while it is very challenging to achieve high accuracy in locating the precise location of the ball, it is relatively easy to achieve very high accuracy in locating the ball among a set of ball-like candidates and 2) it is much better to study the trajectory information of the ball since the ball is the "most active" object in the BSV. Once the ball trajectories are computed, the ball locations can be reliably recovered from them. One important advantage of our algorithm is that it is able to reliably detect partially occluded or merged balls in the sequence. Two videos from the 2002 FIFA World Cup were used to evaluate our algorithm. It achieves a high accuracy of about 81% for ball location

Journal ArticleDOI
TL;DR: A solution to improve the throughput of an overlay multicast session with heterogeneous receivers by organizing the receivers into layered data distribution meshes and sending substreams to each mesh using layered coding is proposed.
Abstract: Recent advances in information theory show that the throughput of a multicast session can be improved using network coding. In overlay networks, the available bandwidth between sender and different receivers are different. In this paper, we propose a solution to improve the throughput of an overlay multicast session with heterogeneous receivers by organizing the receivers into layered data distribution meshes and sending substreams to each mesh using layered coding. Our solutions utilize alternative paths and network coding in each mesh. We first formulate the problem into a mathematical programming, whose optimal solution requires global information. We therefore present a distributed heuristic algorithm. The heuristic progressively organizes the receivers into layered meshes. Each receiver can subscribe to a proper number of meshes to maximize its throughput by fully utilizing its available bandwidth. The benefits of organizing the topology into layered mesh and using network coding are demonstrated through extensive simulations. Numerical results indicate that the average throughput of a multicast session is significantly improved (up to 50% to 60%) with only slightly higher delay and network resource consumption

Journal ArticleDOI
TL;DR: The algorithm integrates two new techniques: i) a utility-based model using the rate-distortion function as the application utility measure for optimizing the overall video quality; and ii) a two-timescale approach of rate averages to satisfy both media and TCP-friendliness.
Abstract: This paper presents a media- and TCP-friendly rate-based congestion control algorithm (MTFRCC) for scalable video streaming in the Internet. The algorithm integrates two new techniques: i) a utility-based model using the rate-distortion function as the application utility measure for optimizing the overall video quality; and ii) a two-timescale approach of rate averages (long-term and short-term) to satisfy both media and TCP-friendliness. We evaluate our algorithm through simulation and compare the results against the TCP-friendly rate control (TFRC) algorithm. For assessment, we consider five criteria: TCP fairness, responsiveness, aggressiveness, overall video quality, and smoothness of the resulting bit rate. Our simulation results manifest that MTFRCC performs better than TFRC for various congestion levels, including an improvement of the overall video quality.

Journal ArticleDOI
TL;DR: It is demonstrated through analysis that data partitioning, which is an essential function of MRTP, can effectively reduce the short-range dependence of multimedia data, thus improving its queueing performance in underlying networks.
Abstract: Real-time multimedia transport has stringent quality of service requirements, which are generally not supported by current network architectures. In emerging mobile ad hoc networks, frequent topology changes and link failures cause severe packet losses, which degrade the quality of received media. However, in such mesh networks, there usually exist multiple paths between any source and destination nodes. Such path diversity has been demonstrated to be effective in combating congestion and link failures for improved media quality. In this paper, we present a new protocol to facilitate multipath transport of real-time multimedia data. The proposed protocol, the multiflow real-time transport protocol (MRTP), provides a convenient vehicle for real-time applications to partition and transmit data using multiple flows. We demonstrate through analysis that data partitioning, which is an essential function of MRTP, can effectively reduce the short-range dependence of multimedia data, thus improving its queueing performance in underlying networks. Furthermore, we show that a few flows are sufficient for MRTP to exploit most of the benefits of multipath transport. Finally, we present a comprehensive simulation study on the performance of MRTP under a mobile ad hoc network. We show that with one additional path, MRTP outperformed single-flow RTP by a significant margin.

Journal ArticleDOI
TL;DR: The optimal trade-off between bits allocated to audio and to video under global bitrate constraints is investigated and models for the interactions between audio and video in terms of perceived audiovisual quality are explored.
Abstract: This paper studies the quality of multimedia content at very low bitrates. We carried out subjective experiments for assessing audiovisual, audio-only, and video-only quality. We selected content and encoding parameters that are typical of mobile applications. Our focus were the MPEG-4 AVC (a.k.a. H.264) and AAC coding standards. Based on these data, we first analyze the influence of video and audio coding parameters on quality. We investigate the optimal trade-off between bits allocated to audio and to video under global bitrate constraints. Finally, we explore models for the interactions between audio and video in terms of perceived audiovisual quality

Journal ArticleDOI
Cormac Herley1
TL;DR: It is demonstrated that it is perfectly feasible to identify in realtime ROs that occur days or even weeks apart in audio or video streams, and the compute and buffering requirements are comfortably within reach for a basic desktop computer.
Abstract: Many media streams consist of distinct objects that repeat. For example, broadcast television and radio signals contain advertisements, call sign jingles, songs, and even whole programs that repeat. The problem we address is to explicitly identify the underlying structure in repetitive streams and de-construct them into their component objects. Our algorithm exploits dimension reduction techniques on the audio portion of a multimedia stream to make search and buffering feasible. Our architecture assumes no a priori knowledge of the streams, and does not require that the repeating objects (ROs) be known. Everything the system needs, including the position and duration of the ROs, is learned on the fly. We demonstrate that it is perfectly feasible to identify in realtime ROs that occur days or even weeks apart in audio or video streams. Both the compute and buffering requirements are comfortably within reach for a basic desktop computer. We outline the algorithms, enumerate several applications and present results from real broadcast streams.

Journal ArticleDOI
TL;DR: Experimental results indicate that the architecture and protocols can be combined to yield voice quality on par with the public switched telephone network.
Abstract: The cost savings and novel features associated with voice over IP (VoIP) are driving its adoption by service providers. Unfortunately, the Internet's best effort service model provides no quality of service guarantees. Because low latency and jitter are the key requirements for supporting high-quality interactive conversations, VoIP applications use UDP to transfer data, thereby subjecting themselves to quality degradations caused by packet loss and network failures. In this paper, we describe an architecture to improve the performance of such VoIP applications. Two protocols are used for localized packet loss recovery and rapid rerouting in the event of network failures. The protocols are deployed on the nodes of an application-level overlay network and require no changes to the underlying infrastructure. Experimental results indicate that the architecture and protocols can be combined to yield voice quality on par with the public switched telephone network

Journal ArticleDOI
TL;DR: An enhancement to the buffer-status based H.264/AVC bit allocation method is proposed by using a PSNR-based frame complexity estimation to improve the existing mean absolute difference based (MAD-based) complexity measure.
Abstract: This paper presents an efficient rate control scheme for the H.264/AVC video coding in low-delay environments. In our scheme, we propose an enhancement to the buffer-status based H.264/AVC bit allocation method. The enhancement is by using a PSNR-based frame complexity estimation to improve the existing mean absolute difference based (MAD-based) complexity measure. Bit allocation to each frame is not just computed by encoder buffer status but also adjusted by a combined frame complexity measure. To prevent the buffer from undesirable overflow or underflow under small buffer size constraint in low delay environment,the computed quantization parameter (QP) for the current MB is adjusted based on actual encoding results at that point. We also propose to compare the bits produced by each mode with the average target bits per MB to dynamically modify Lagrange multiplier (/spl lambda//sub MODE/) for mode decision. The objective of QP and /spl lambda//sub MODE/ adjustment is to produce bits as close to the frame target as possible, which is especially important for low delay applications. Simulation results show that the H.264 coder, using our proposed scheme, obtains significant improvement for the mismatch ratio of target bits and actual bits in all testing cases, achieves a visual quality improvement of about 0.6 dB on the average, performs better for buffer overflow and underflow,and achieves a similar or smaller PSNR deviation.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method is adequately satisfactory in terms of both precision versus recall and time needed for retrieval, and that it can be used for 3-D model search and retrieval in a highly efficient manner.
Abstract: Measuring the similarity between three-dimensional (3-D) objects is a challenging problem, with applications in computer vision, molecular biology, computer graphics, and many other areas. This paper describes a novel method for 3-D model content-based search based on the 3-D Generalized Radon Transform and a querying by-3-D-model approach. A set of descriptor vectors is extracted using the Radial Integration Transform (RIT) and the Spherical Integration Transform (SIT), which represent significant shape characteristics. After the proper alignment of the models, descriptor vectors are produced which are invariant in terms of translation, scaling and rotation. Experiments were performed using three different databases and comparing the proposed method with those most commonly cited in the literature. Experimental results show that the proposed method is adequately satisfactory in terms of both precision versus recall and time needed for retrieval, and that it can be used for 3-D model search and retrieval in a highly efficient manner.