scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2006"


Journal ArticleDOI
TL;DR: This paper presents results of an extensive subjective quality assessment study in which a total of 779 distorted images were evaluated by about two dozen human subjects and is the largest subjective image quality study in the literature in terms of number of images, distortion types, and number of human judgments per image.
Abstract: Measurement of visual quality is of fundamental importance for numerous image and video processing applications, where the goal of quality assessment (QA) algorithms is to automatically assess the quality of images or videos in agreement with human quality judgments. Over the years, many researchers have taken different approaches to the problem and have contributed significant research in this area and claim to have made progress in their respective domains. It is important to evaluate the performance of these algorithms in a comparative setting and analyze the strengths and weaknesses of these methods. In this paper, we present results of an extensive subjective quality assessment study in which a total of 779 distorted images were evaluated by about two dozen human subjects. The "ground truth" image quality data obtained from about 25 000 individual human quality judgments is used to evaluate the performance of several prominent full-reference image quality assessment algorithms. To the best of our knowledge, apart from video quality studies conducted by the Video Quality Experts Group, the study presented in this paper is the largest subjective image quality study in the literature in terms of number of images, distortion types, and number of human judgments per image. Moreover, we have made the data from the study freely available to the research community . This would allow other researchers to easily report comparative results in the future

2,598 citations


Journal ArticleDOI
TL;DR: This paper proposes a context-based adaptive method to speed up the multiple reference frames ME, and shows that the proposed algorithm can maintain competitively the same video quality as exhaustive search of several reference frames.
Abstract: In the new video coding standard H.264/AVC, motion estimation (ME) is allowed to search multiple reference frames. Therefore, the required computation is highly increased, and it is in proportion to the number of searched reference frames. However, the reduction in prediction residues is mostly dependent on the nature of sequences, not on the number of searched frames. Sometimes the prediction residues can be greatly reduced, but frequently a lot of computation is wasted without achieving any better coding performance. In this paper, we propose a context-based adaptive method to speed up the multiple reference frames ME. Statistical analysis is first applied to the available information for each macroblock (MB) after intra-prediction and inter-prediction from the previous frame. Context-based adaptive criteria are then derived to determine whether it is necessary to search more reference frames. The reference frame selection criteria are related to selected MB modes, inter-prediction residues, intra-prediction residues, motion vectors of subpartitioned blocks, and quantization parameters. Many available standard video sequences are tested as examples. The simulation results show that the proposed algorithm can maintain competitively the same video quality as exhaustive search of multiple reference frames. Meanwhile, 76 %-96 % of computation for searching unnecessary reference frames can be avoided. Moreover, our fast reference frame selection is orthogonal to conventional fast block matching algorithms, and they can be easily combined to achieve further efficient implementations.

207 citations


Journal ArticleDOI
01 Mar 2006
TL;DR: The main idea is quick checking of the entire search range with simplified matching criterion to globally eliminate impossible candidates, followed by finer selection among potential best matched candidates.
Abstract: Block matching motion estimation is the heart of video coding systems. During the last two decades, hundreds of fast algorithms and VLSI architectures have been proposed. In this paper, we try to provide an extensive exploration of motion estimation with our new developments. The main concepts of fast algorithms can be classified into six categories: reduction in search positions, simplification of matching criterion, bitwidth reduction, predictive search, hierarchical search, and fast full search. Comparisons of various algorithms in terms of video quality and computational complexity are given as useful guidelines for software applications. As for hardware implementations, full search architectures derived from systolic mapping are first introduced. The systolic arrays can be divided into inter-type and intra-type with 1-D, 2-D, and tree structures. Hexagonal plots are presented for system designers to clearly evaluate the architectures in six aspects including gate count, required frequency, hard-ware utilization, memory bandwidth, memory bitwidth, and latency. Next, architectures supporting fast algorithms are also reviewed. Finally, we propose our algorithmic and architectural co-development. The main idea is quick checking of the entire search range with simplified matching criterion to globally eliminate impossible candidates, followed by finer selection among potential best matched candidates. The operations of the two stages are mapped to the same hardware for resource sharing. Simulation results show that our design is ten times more area-speed efficient than full search architectures while the video quality is competitively the same.

199 citations


Journal ArticleDOI
TL;DR: The proposed approach solves some problems inherent to objective metrics that should predict subjective quality score obtained using the single stimulus continuous quality evaluation (SSCQE) method and relies on the use of a convolutional neural network that allows a continuous time scoring of the video.
Abstract: This paper describes an application of neural networks in the field of objective measurement method designed to automatically assess the perceived quality of digital videos. This challenging issue aims to emulate human judgment and to replace very complex and time consuming subjective quality assessment. Several metrics have been proposed in literature to tackle this issue. They are based on a general framework that combines different stages, each of them addressing complex problems. The ambition of this paper is not to present a global perfect quality metric but rather to focus on an original way to use neural networks in such a framework in the context of reduced reference (RR) quality metric. Especially, we point out the interest of such a tool for combining features and pooling them in order to compute quality scores. The proposed approach solves some problems inherent to objective metrics that should predict subjective quality score obtained using the single stimulus continuous quality evaluation (SSCQE) method. This latter has been adopted by video quality expert group (VQEG) in its recently finalized reduced referenced and no reference (RRNR-TV) test plan. The originality of such approach compared to previous attempts to use neural networks for quality assessment, relies on the use of a convolutional neural network (CNN) that allows a continuous time scoring of the video. Objective features are extracted on a frame-by-frame basis on both the reference and the distorted sequences; they are derived from a perceptual-based representation and integrated along the temporal axis using a time-delay neural network (TDNN). Experiments conducted on different MPEG-2 videos, with bit rates ranging 2-6 Mb/s, show the effectiveness of the proposed approach to get a plausible model of temporal pooling from the human vision system (HVS) point of view. More specifically, a linear correlation criteria, between objective and subjective scoring, up to 0.92 has been obtained on a set of typical TV videos

197 citations


Patent
23 Jan 2006
TL;DR: In this article, a peer-to-peer novel video streaming scheme is described in which each peer stores and streams videos to the requesting client peers, each video is encoded into multiple descriptions and each description is placed on a different node.
Abstract: A peer-to-peer novel video streaming scheme is described in which each peer stores and streams videos to the requesting client peers. Each video is encoded into multiple descriptions and each description is placed on a different node. If a serving peer disconnects in the middle of a streaming session, the system searches for a replacement peer that stores the same video description and has sufficient uplink bandwidth. Employing multiple description coding in a peer-to-peer based network improves the robustness of the distributed streaming content in the event a serving peer is lost. Video quality can be maintained in the presence of server peers being lost. The video codec design and network policies have a significant effect on the streamed video quality. The system performance generally improves as the number of descriptions M for the video increases, which implies that a higher video quality can be obtained with the same network loading.

177 citations


Journal ArticleDOI
TL;DR: This work designs models using the results of a subjective test based on 1080 packet losses in 72 minutes of video, and develops three methods, which differ in the amount of information available to them.
Abstract: We consider the problem of predicting packet loss visibility in MPEG-2 video. We use two modeling approaches: CART and GLM. The former classifies each packet loss as visible or not; the latter predicts the probability that a packet loss is visible. For each modeling approach, we develop three methods, which differ in the amount of information available to them. A reduced reference method has access to limited information based on the video at the encoder's side and has access to the video at the decoder's side. A no-reference pixel-based method has access to the video at the decoder's side but lacks access to information at the encoder's side. A no-reference bitstream-based method does not have access to the decoded video either; it has access only to the compressed video bitstream, potentially affected by packet losses. We design our models using the results of a subjective test based on 1080 packet losses in 72 minutes of video.

167 citations


Journal ArticleDOI
TL;DR: The proposed model incorporates the spatio-temporal contrast sensitivity function, the influence of eye movements, luminance adaptation, and contrast masking to be more consistent with human perception and is capable of yielding JNDs for both still images and video with significant motion.
Abstract: Just-noticeable distortion (JND), which refers to the maximum distortion that the human visual system (HVS) cannot perceive, plays an important role in perceptual image and video processing. In comparison with JND estimation for images, estimation of the JND profile for video needs to take into account the temporal HVS properties in addition to the spatial properties. In this paper, we develop a spatio-temporal model estimating JND in the discrete cosine transform domain. The proposed model incorporates the spatio-temporal contrast sensitivity function, the influence of eye movements, luminance adaptation, and contrast masking to be more consistent with human perception. It is capable of yielding JNDs for both still images and video with significant motion. The experiments conducted in this study have demonstrated that the JND values estimated for video sequences with moving objects by the model are in line with the HVS perception. The accurate JND estimation of the video towards the actual visibility bounds can be translated into resource savings (e.g., for bandwidth/storage or computation) and performance improvement in video coding and other visual processing tasks (such as perceptual quality evaluation, visual signal restoration/enhancement, watermarking, authentication, and error protection)

161 citations


Journal ArticleDOI
TL;DR: The results demonstrate that adaptation using the OAT out-performs conventional adaptation strategies in which only a single aspect of the video quality is adapted, giving better user-perceived quality in both the short and long term.
Abstract: In general, video quality adaptation and video quality evaluation are distinct activities. Most adaptive delivery mechanisms for streaming multimedia content do not explicitly consider user-perceived quality when making adaptation decisions. Equally, video quality evaluation techniques are not designed to evaluate instantaneous quality where the quality is changing over time. We propose that an Optimal Adaptation Trajectory (OAT) through the set of possible encoding exists, and that it indicates how to adapt encoding quality in response to changes in network conditions in order to maximize user-perceived quality. The subjective and objective tests carried out to find such trajectories for a number of different MPEG-4 video clips are described. Experimental subjective testing results are presented that demonstrate the dynamic nature of user perception with adapting multimedia. The results demonstrate that adaptation using the OAT out-performs conventional adaptation strategies in which only a single aspect of the video quality is adapted. In contrast, the OAT provides a mechanism to adapt multiple aspects of the video quality thereby giving better user-perceived quality in both the short and long term.

130 citations


Patent
Yiliang Bao1, Yan Ye1
18 Aug 2006
TL;DR: In this article, video coding techniques that support spatial scalability using a generalized fine granularity scalability (FGS) approach have been proposed, which can be seen as a generalization of the fine-grained scalability approach.
Abstract: The disclosure is directed to video coding techniques that support spatial scalability using a generalized fine granularity scalability (FGS) approach. Various degrees of spatial scalability can be achieved by sending spatially scalable enhancement layers in a generalized FGS format. Spatially scalable enhancement bitstreams can be arbitrarily truncated to conform to network conditions, channel conditions and/or decoder capabilities. Coding coefficients and syntax elements for spatial scalability can be embedded in a generalized FGS format. For good network or channel conditions, and/or enhanced decoder capabilities, additional bits received via one or more enhancement layers permit encoded video to be reconstructed with increased spatial resolution and continuously improved video quality across different spatial resolutions. The techniques permit spatial scalability layers to be coded as FGS layers, rather than discrete layers, permitting arbitrary scalability. The techniques may include features to curb error propagation that may otherwise arise due to partial decoding.

120 citations



Journal ArticleDOI
TL;DR: The results demonstrate the merits and the need for cross-layer optimization in order to provide an efficient solution for real-time video transmission using existing protocols and infrastructures and provide important insights for future protocol and system design targeted at enhanced video streaming support across wireless mesh networks.
Abstract: The proliferation of wireless multihop communication infrastructures in office or residential environments depends on their ability to support a variety of emerging applications requiring real-time video transmission between stations located across the network. We propose an integrated cross-layer optimization algorithm aimed at maximizing the decoded video quality of delay-constrained streaming in a multihop wireless mesh network that supports quality-of-service. The key principle of our algorithm lays in the synergistic optimization of different control parameters at each node of the multihop network, across the protocol layers-application, network, medium access control, and physical layers, as well as end-to-end, across the various nodes. To drive this optimization, we assume an overlay network infrastructure, which is able to convey information on the conditions of each link. Various scenarios that perform the integrated optimization using different levels ("horizons") of information about the network status are examined. The differences between several optimization scenarios in terms of decoded video quality and required streaming complexity are quantified. Our results demonstrate the merits and the need for cross-layer optimization in order to provide an efficient solution for real-time video transmission using existing protocols and infrastructures. In addition, they provide important insights for future protocol and system design targeted at enhanced video streaming support across wireless mesh networks

Journal ArticleDOI
TL;DR: An optimization framework is proposed, which enables the multiple senders to coordinate their packet transmission schedules, such that the average quality over all video clients is maximized, and is very efficient in terms of video quality.
Abstract: We consider the problem of distributed packet selection and scheduling for multiple video streams sharing a communication channel. An optimization framework is proposed, which enables the multiple senders to coordinate their packet transmission schedules, such that the average quality over all video clients is maximized. The framework relies on rate-distortion information that is used to characterize a video packet. This information consists of two quantities: the size of the packet in bits, and its importance for the reconstruction quality of the corresponding stream. A distributed streaming strategy then allows for trading off rate and distortion, not only within a single video stream, but also across different streams. Each of the senders allocates to its own video packets a share of the available bandwidth on the channel in proportion to their importance. We evaluate the performance of the distributed packet scheduling algorithm for two canonical problems in streaming media, namely adaptation to available bandwidth and adaptation to packet loss through prioritized packet retransmissions. Simulation results demonstrate that, for the difficult case of scheduling nonscalably encoded video streams, our framework is very efficient in terms of video quality, both over all streams jointly and also over the individual videos. Compared to a conventional streaming system that does not consider the relative importance of the video packets, the gains in performance range up to 6 dB for the scenario of bandwidth adaptation, and even up to 10 dB for the scenario of random packet loss adaptation.

Journal ArticleDOI
TL;DR: GRACE-1 as discussed by the authors is a cross-layer adaptation framework that coordinates the adaptation of the CPU hardware, OS scheduling, and multimedia quality based on users' preferences to balance the benefits and overhead of cross layer adaptation.
Abstract: Mobile devices primarily processing multimedia data need to support multimedia quality with limited battery energy. To address this challenging problem, researchers have introduced adaptation into multiple system layers, ranging from hardware to applications. Given these adaptive layers, a new challenge is how to coordinate them to fully exploit the adaptation benefits. This paper presents a novel cross-layer adaptation framework, called GRACE-1, that coordinates the adaptation of the CPU hardware, OS scheduling, and multimedia quality based on users' preferences. To balance the benefits and overhead of cross-layer adaptation, GRACE-1 takes a hierarchical approach: It globally adapts all three layers to large system changes, such as application entry or exit, and internally adapts individual layers to small changes in the processed multimedia data. We have implemented GRACE-1 on an HIP laptop with the adaptive Athlon CPU, Linux-based OS, and video codecs. Our experimental results show that, compared to schemes that adapt only some layers or adapt only to large changes, GRACE-1 reduces the laptop's energy consumption up to 31.4 percent while providing better or the same video quality.

Journal ArticleDOI
TL;DR: A computationally efficient video distortion metric that can operate in full- or reduced-reference mode as required, based on a model of the human visual system implemented using the wavelet transform and separable filters is presented.
Abstract: Video distortion metrics based on models of the human visual system have traditionally used comparisons between the distorted signal and a reference signal to calculate distortions objectively. In video coding applications, this is not prohibitive. In quality monitoring applications, however, access to the reference signal is often limited. This paper presents a computationally efficient video distortion metric that can operate in full- or reduced-reference mode as required. The metric is based on a model of the human visual system implemented using the wavelet transform and separable filters. The visual model is parameterized using a set of video frames and the associated quality scores. The visual model's hierarchical structure, as well as the limited impact of fine scale distortions on quality judgments of severely impaired video, are exploited to build a framework for scaling the bitrate required to represent the reference signal. Two applications of the metric are also presented. In the first, the metric is used as the distortion measure in a rate-distortion optimized rate control algorithm for MPEG-2 video compression. The resulting compressed video sequences demonstrate significant improvements in visual quality over compressed sequences with allocations determined by the TM5 rate control algorithm operating with MPEG-2 at the same rate. In the second, the metric is used to estimate time series of objective quality scores for distorted video sequences using reference bitrates as low as 10 kb/s. The resulting quality scores more accurately model subjective quality recordings than do those estimated using the mean squared error as a distortion metric, while requiring a fraction of the bitrate used to represent the reference signal. The reduced-reference metric's performance is comparable to that of the full-reference metrics tested in the first Video Quality Experts Group evaluation.

Proceedings ArticleDOI
26 Mar 2006
TL;DR: The structural similarity index is used as a measure for perceptual similarity and a multiscale algorithm for obtaining a perceptual disparity map and a stereo-similarity map to be used in the suggested metric is designed.
Abstract: We suggest a compound full-reference stereo-video quality metric composed of two components: a monoscopic quality component and stereoscopic quality component. While the former assesses the trivial monoscopic perceived distortions caused by blur, noise, contrast change etc., the latter assesses the perceived degradation of binocular depth cues only. We use the structural similarity index as a measure for perceptual similarity and design a multiscale algorithm for obtaining a perceptual disparity map and a stereo-similarity map to be used in the suggested metric. We verify the performance of the metric with subjective tests on distorted stereo images and coded stereo-video sequences with a final aim to build a perceptually-aware feedback for a H.264 based stereo-video encoder

Patent
06 Sep 2006
TL;DR: In this article, a multi-modal quality estimation unit (MQU) was proposed to estimate a video quality value on the basis of an audio quality evaluation value (21A) and a video QU (21B).
Abstract: A multi-modal quality estimation unit (11) estimates a multi-modal quality value (23A) on the basis of an audio quality evaluation value (21A) and a video quality evaluation value (21). In addition, a delay quality degradation amount estimation unit (12) estimates a delay quality degradation amount (23B) on the basis of an audio delay time (22A) and a video delay time (22B). A video communication quality estimation unit (13) estimates a video communication quality value (24) on the basis of a multi-modal quality value (23A) and a delay quality degradation amount (23B).

Journal ArticleDOI
TL;DR: The algorithm integrates two new techniques: i) a utility-based model using the rate-distortion function as the application utility measure for optimizing the overall video quality; and ii) a two-timescale approach of rate averages to satisfy both media and TCP-friendliness.
Abstract: This paper presents a media- and TCP-friendly rate-based congestion control algorithm (MTFRCC) for scalable video streaming in the Internet. The algorithm integrates two new techniques: i) a utility-based model using the rate-distortion function as the application utility measure for optimizing the overall video quality; and ii) a two-timescale approach of rate averages (long-term and short-term) to satisfy both media and TCP-friendliness. We evaluate our algorithm through simulation and compare the results against the TCP-friendly rate control (TFRC) algorithm. For assessment, we consider five criteria: TCP fairness, responsiveness, aggressiveness, overall video quality, and smoothness of the resulting bit rate. Our simulation results manifest that MTFRCC performs better than TFRC for various congestion levels, including an improvement of the overall video quality.

Proceedings ArticleDOI
09 Jul 2006
TL;DR: This paper presents a framework for efficiently streaming scalable video from multiple servers over heterogeneous network paths, and proposes to use rateless codes, or Fountain codes, such that each server acts as an independent source, without the need to coordinate its sending strategy with other servers.
Abstract: This paper presents a framework for efficiently streaming scalable video from multiple servers over heterogeneous network paths. We propose to use rateless codes, or Fountain codes, such that each server acts as an independent source, without the need to coordinate its sending strategy with other servers. In this case, the problem of maximizing the received video quality and minimizing the bandwidth usage, is simply reduced to a rate allocation problem. We provide an optimal solution for an ideal scenario where the loss probvability on each server-client path is exactly known. We then present a heuristic-based algorithm, which implements an unequal error protection scheme for the more realistic case of imperfect knowledge of the loss probabilities. Simulation results finally demonstrate the efficiency of the proposed algorithm, in distributed streaming scenarios over lossy channels.

Journal ArticleDOI
TL;DR: IPTV service assurance can encompass much more, including subscriber management and authorization, capacity management, perceived video picture quality, and error correction and concealment; accomplished through integrated test and monitoring for rapid resolution of customer complaints.
Abstract: This article discusses service assurance aspects specific to IPTV services and video quality. Classic network monitoring generally ensures that each network element, network segment, and subnetwork is functioning reliably, and may also encompass routing and reachability across different network domains. IPTV service assurance can encompass much more, including subscriber management and authorization, capacity management, perceived video picture quality, and error correction and concealment; accomplished through integrated test and monitoring for rapid resolution of customer complaints. This is all needed to ensure that the customer has an overall video quality of experience (QoE) at least as enjoyable as current TV delivery methods

Journal ArticleDOI
TL;DR: The optimal trade-off between bits allocated to audio and to video under global bitrate constraints is investigated and models for the interactions between audio and video in terms of perceived audiovisual quality are explored.
Abstract: This paper studies the quality of multimedia content at very low bitrates. We carried out subjective experiments for assessing audiovisual, audio-only, and video-only quality. We selected content and encoding parameters that are typical of mobile applications. Our focus were the MPEG-4 AVC (a.k.a. H.264) and AAC coding standards. Based on these data, we first analyze the influence of video and audio coding parameters on quality. We investigate the optimal trade-off between bits allocated to audio and to video under global bitrate constraints. Finally, we explore models for the interactions between audio and video in terms of perceived audiovisual quality

Proceedings ArticleDOI
13 Sep 2006
TL;DR: A hierarchical parallelization of H.264 encoders very well suited to low cost clusters is proposed and is a compromise between speed-up and latency and then a broader spectrum of applications can be covered.
Abstract: Last generation video encoding standards increase computing demands in order to reach the limits on compression efficiency. This is particularly the case of H.264/AVC specification that is gaining interest in industry. We are interested in applying parallel processing to H.264 encoders in order to fulfill the computation requirements imposed by stressing applications like video on demand, videoconference, live broadcast, etc. Given a delivered video quality and bit rate, the main complexity parameters are image resolution, frame rate and latency. These parameters can still be pushed forward in such a way that special purpose hardware solutions are not available. Parallel processing based on off-the-shelf components is a more flexible general purpose alternative. In this work we propose a hierarchical parallelization of H.264 encoders very well suited to low cost clusters. Our proposal uses MPI message passing parallelization at two levels: GOP and frame. The GOP level encodes simultaneously several groups of consecutive frames and the frame level encodes in parallel several slices of one frame. In previous work we found that GOP parallelism alone gives good speed-up but imposes very high latency, on the other side frame parallelism gets less efficiency but low latency. Combining both approaches we obtain a compromise between speed-up and latency and then a broader spectrum of applications can be covered.

Journal ArticleDOI
TL;DR: The proposed backlight scaling technique is capable of efficiently computing the flickering effect online and subsequently using a measure of the temporal distortion to appropriately adjust the slack on the intra-frame spatial distortion, thereby, achieving a good balance between the two sources of distortion while maximizing the backlight dimming-driven energy saving in the display system and meeting an overall video quality figure of merit.
Abstract: Liquid crystal displays (LCDs) have appeared in applications ranging from medical equipment to automobiles, gas pumps, laptops, and handheld portable computers. These display components present a cascaded energy attenuator to the battery of the handheld device which is responsible for about half of the energy drain at maximum display intensity. As such, the display components become the main focus of every effort for maximization of embedded system's battery lifetime. This paper proposes an approach for pixel transformation of the displayed image to increase the potential energy saving of the backlight scaling method. The proposed approach takes advantage of human visual system (HVS) characteristics and tries to minimize distortion between the perceived brightness values of the individual pixels in the original image and those of the backlight-scaled image. This is in contrast to previous backlight scaling approaches which simply match the luminance values of the individual pixels in the original and backlight-scaled images. Furthermore, this paper proposes a temporally-aware backlight scaling technique for video streams. The goal is to maximize energy saving in the display system by means of dynamic backlight dimming subject to a video distortion tolerance. The video distortion comprises of: 1) an intra-frame (spatial) distortion component due to frame-sensitive backlight scaling and transmittance function tuning and 2) an inter-frame (temporal) distortion component due to large-step backlight dimming across frames modulated by the psychophysical characteristics of the human visual system. The proposed backlight scaling technique is capable of efficiently computing the flickering effect online and subsequently using a measure of the temporal distortion to appropriately adjust the slack on the intra-frame spatial distortion, thereby, achieving a good balance between the two sources of distortion while maximizing the backlight dimming-driven energy saving in the display system and meeting an overall video quality figure of merit. The proposed dynamic backlight scaling approach is amenable to highly efficient hardware realization and has been implemented on the Apollo Testbed II. Actual current measurements demonstrate the effectiveness of proposed technique compared to the previous backlight dimming techniques, which have ignored the temporal distortion effect

Proceedings ArticleDOI
05 Jun 2006
TL;DR: A novel realistic simulation tool-set for evaluating video delivered quality over wireless network that integrates EvalVid and NS-2 and finds that the quality of video transmission with burst packet errors is superior to the random distributed packet errors in the same packet error rate.
Abstract: The objective of this paper is to present a novel realistic simulation tool-set for evaluating video delivered quality over wireless network. This tool-set integrates EvalVid and NS-2. With the integration, researchers can easily analyze their designed mechanisms, such as network protocols or QoS control schemes in a realistic simulation environment. We used a case study about video transmission over wireless network to demonstrate the simulation with the tool-set. From the results, we found that the quality of video transmission with burst packet errors is superior to the random distributed packet errors in the same packet error rate. In addition, in the same packet error rate, unicast transmission leads to better video delivered quality than multicast transmission because of retransmission. Also, using the tool-set researchers can assess video quality not only with the evaluation metrics, but also with real video sequences. In brief, researchers who utilize our proposed QoS assessment framework will be benefited in verifying their designs regarding video transmission over wireless network.

Journal ArticleDOI
TL;DR: This paper addresses the problem of unequal error protection for scalable video transmission over wireless packet-erasure channel with a genetic algorithm to quickly get the allocation pattern, which is hard to get with other conventional methods, like hill-climbing method.
Abstract: In this paper, we address the problem of unequal error protection (UEP) for scalable video transmission over wireless packet-erasure channel. Unequal amounts of protection are allocated to the different frames (I- or P-frame) of a group-of-pictures (GOP), and in each frame, unequal amounts of protection are allocated to the progressive bit-stream of scalable video to provide a graceful degradation of video quality as packet loss rate varies. We use a genetic algorithm (GA) to quickly get the allocation pattern, which is hard to get with other conventional methods, like hill-climbing method. Theoretical analysis and experimental results both demonstrate the advantage of the proposed algorithm.

Proceedings ArticleDOI
01 Oct 2006
TL;DR: A model for predicting the visibility of multiple packet losses is proposed and its performance on dual losses is demonstrated and it is predicted the probability that a loss is visible using a generalized linear model.
Abstract: We consider modeling the visibility of individual and multiple packet losses in H. 264 videos. We propose a model for predicting the visibility of multiple packet losses and demonstrate its performance on dual losses (two nearby packet losses). We extract the factors affecting visibility using a reduced-reference method. We predict the probability that a loss is visible using a generalized linear model. We achieve MSE values (between actual and predicted probabilities) of 0.0253 and 0.0398 for individual and dual losses respectively. We also examine the effect of various factors on visibility.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: This paper presents the subjective test which has already been set up for the estimation of video quality, and deduces the different situations of frame freeze and frame skip distortion has different impairments on video quality.
Abstract: Apparent motion discontinuities in the videos, such as frame freezing and frame skipping, are a common temporal degradation in the video transmission via Internet and Mobile channels. The end-user perceives a fluidity break of visual information having an impact on his quality assessment of the delivered video sequence. A video subjective test based on SAMVIQ (Subjective Assessment Methodology for VIdeo Quality) [1] has set up to estimate the effect of frame freeze and frame skip on video quality. In this paper, we will present the subjective test which has already set up for the estimation of video quality, then we can deduce the different situations of frame freeze and frame skip distortion has different impairments on video quality.

Proceedings ArticleDOI
18 Apr 2006
TL;DR: The effect of burst packet losses on the video delivered quality is less than distributed packet losses in the same packet loss rate, and the smaller size of the play-out buffer leads to more packet drops and worse video quality.
Abstract: The purpose of this paper is to study the packet loss effect on MPEG video transmission quality in wireless networks. First, we consider the distribution of packet losses in wireless network, including distributed and burst packet losses. Besides, we also discuss the additional packet drops by the play-out buffer at the received end, and the effect of the transmission packet size on the video delivered quality. From the results, we find that the effect of burst packet losses on the video delivered quality is less than distributed packet losses in the same packet loss rate. Moreover, the smaller size of the play-out buffer leads to more packet drops and worse video quality. Finally, if there is no video recovery for video transmission, the video delivered quality of the larger packet size will be better than smaller packet size.

Patent
27 Sep 2006
TL;DR: In this article, an Encoder Assisted Frame Rate Up Conversion (EA-FRUC) system was proposed to improve the modeling of moving objects, compression efficiency and reconstructed video quality.
Abstract: An Encoder Assisted Frame Rate Up Conversion (EA-FRUC) system that utilizes various motion models, such as affine models, in addition to video coding and pre-processing operations at the video encoder to exploit the FRUC processing that will occur in the decoder in order to improve the modeling of moving objects, compression efficiency and reconstructed video quality. Furthermore, objects are identified in a way that reduces the amount of information necessary for encoding to render the objects on the decoder device.

Patent
20 Jul 2006
TL;DR: In this paper, a system and method for diagnosis of video device performance in the transfer of audio visual data over a video network physically interfaces with the network to receive audio-visual data associated with the video device of interest.
Abstract: A system and method for diagnosis of video device performance in the transfer of audio visual data over a video network physically interfaces with the network to receive audio visual data associated with the video device of interest and uses diagnostic tools to access the audio visual data for determination of performance statistics with analysis of the accessed audio visual data, including video device jitter, latency, throughput, and packet loss.

Journal ArticleDOI
Chulhee Lee1, Sungdeuk Cho1, Jihwan Choe1, Taeuk Jeong1, Wonseok Ahn1, Eunjae Lee1 
TL;DR: Experiments show that the proposed method significantly outperforms the conventional peak signal-to-noise ratio (PSNR) and was included in international recommendations for objective video quality measurement.
Abstract: We propose a new method for an objective measurement of video quality. By analyzing subjective scores of various video sequences, we find that the human visual system is particularly sensitive to degradation around edges. In other words, when edge areas of a video sequence are degraded, evaluators tend to give low quality scores to the video, even though the overall mean squared error is not large. Based on this observation, we propose an objective video quality measurement method based on degradation around edges. In the proposed method, we first apply an edge detection algorithm to videos and locate edge areas. Then, we measure degradation of those edge areas by computing mean squared errors and use it as a video quality metric after some postprocessing. Experiments show that the proposed method significantly outperforms the conventional peak signal-to-noise ratio (PSNR). This method was also independently evaluated by independent laboratory groups in the Video Quality Experts Group (VQEG) Phase 2 test. The method consistently provided good performances. As a result, the method was included in international recommendations for objective video quality measurement.