scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2009"


Proceedings ArticleDOI
19 Apr 2009
TL;DR: A distributed compressive video sensing (DCVS) framework is proposed to simultaneously capture and compress video data, where almost all computation burdens can be shifted to the decoder, resulting in a very low-complexity encoder.
Abstract: Low-complexity video encoding has been applicable to several emerging applications. Recently, distributed video coding (DVC) has been proposed to reduce encoding complexity to the order of that for still image encoding. In addition, compressive sensing (CS) has been applicable to directly capture compressed image data efficiently. In this paper, by integrating the respective characteristics of DVC and CS, a distributed compressive video sensing (DCVS) framework is proposed to simultaneously capture and compress video data, where almost all computation burdens can be shifted to the decoder, resulting in a very low-complexity encoder. At the decoder, compressed video can be efficiently reconstructed using the modified GPSR (gradient projection for sparse reconstruction) algorithm. With the assistance of the proposed initialization and stopping criteria for GRSR, derived from statistical dependencies among successive video frames, our modified GPSR algorithm can terminate faster and reconstruct better video quality. The performance of our DCVS method is demonstrated via simulations to outperform three known CS reconstruction algorithms.

228 citations


Journal ArticleDOI
TL;DR: This paper has designed a perceptual full reference video quality assessment metric by focusing on the temporal evolutions of the spatial distortions, and has validated this metric with a dataset built from video sequences of various contents.
Abstract: The temporal distortions such as flickering, jerkiness, and mosquito noise play a fundamental part in video quality assessment. A temporal distortion is commonly defined as the temporal evolution, or fluctuation, of the spatial distortion on a particular area which corresponds to the image of a specific object in the scene. Perception of spatial distortions over time can be largely modified by their temporal changes, such as increase or decrease in the distortions, or as periodic changes in the distortions. In this paper, we have designed a perceptual full reference video quality assessment metric by focusing on the temporal evolutions of the spatial distortions. As the perception of the temporal distortions is closely linked to the visual attention mechanisms, we have chosen to first evaluate the temporal distortion at eye fixation level. In this short-term temporal pooling, the video sequence is divided into spatio-temporal segments in which the spatio-temporal distortions are evaluated, resulting in spatio-temporal distortion maps. Afterwards, the global quality score of the whole video sequence is obtained by the long-term temporal pooling in which the spatio-temporal maps are spatially and temporally pooled. Consistent improvement over objective existing video quality assessment methods is observed. Our validation has been realized with a dataset built from video sequences of various contents.

170 citations


Journal ArticleDOI
TL;DR: The correlation between subjective and objective evaluation of color plus depth video and transmission over Internet protocol (IP) is investigated, and subjective results are used to determine more accurate objective quality assessment metrics for 3D color plus Depth video.
Abstract: In the near future, many conventional video applications are likely to be replaced by immersive video to provide a sense of ldquobeing there.rdquo This transition is facilitated by the recent advancement of 3D capture, coding, transmission, and display technologies. Stereoscopic video is the simplest form of 3D video available in the literature. ldquoColor plus depth maprdquo based stereoscopic video has attracted significant attention, as it can reduce storage and bandwidth requirements for the transmission of stereoscopic content over communication channels. However, quality assessment of coded video sequences can currently only be performed reliably using expensive and inconvenient subjective tests. To enable researchers to optimize 3D video systems in a timely fashion, it is essential that reliable objective measures are found. This paper investigates the correlation between subjective and objective evaluation of color plus depth video. The investigation is conducted for different compression ratios, and different video sequences. Transmission over Internet protocol (IP) is also investigated. Subjective tests are performed to determine the image quality and depth perception of a range of differently coded video sequences, with packet loss rates ranging from 0% to 20%. The subjective results are used to determine more accurate objective quality assessment metrics for 3D color plus depth video.

169 citations


Proceedings ArticleDOI
16 Aug 2009
TL;DR: This paper focuses on characterizing and troubleshooting performance issues in one of the largest IPTV networks in North America, and develops a novel diagnosis tool called Giza that is specifically tailored to the enormous scale and hierarchical structure of the IPTV network.
Abstract: IPTV is increasingly being deployed and offered as a commercial service to residential broadband customers. Compared with traditional ISP networks, an IPTV distribution network (i) typically adopts a hierarchical instead of mesh-like structure, (ii) imposes more stringent requirements on both reliability and performance, (iii) has different distribution protocols (which make heavy use of IP multicast) and traffic patterns, and (iv) faces more serious scalability challenges in managing millions of network elements. These unique characteristics impose tremendous challenges in the effective management of IPTV network and service.In this paper, we focus on characterizing and troubleshooting performance issues in one of the largest IPTV networks in North America. We collect a large amount of measurement data from a wide range of sources, including device usage and error logs, user activity logs, video quality alarms, and customer trouble tickets. We develop a novel diagnosis tool called Giza that is specifically tailored to the enormous scale and hierarchical structure of the IPTV network. Giza applies multi-resolution data analysis to quickly detect and localize regions in the IPTV distribution hierarchy that are experiencing serious performance problems. Giza then uses several statistical data mining techniques to troubleshoot the identified problems and diagnose their root causes. Validation against operational experiences demonstrates the effectiveness of Giza in detecting important performance issues and identifying interesting dependencies. The methodology and algorithms in Giza promise to be of great use in IPTV network operations.

163 citations


Proceedings ArticleDOI
29 Jul 2009
TL;DR: A database containing subjective assessment scores relative to 78 video streams encoded with H.264/AVC and corrupted by simulating the transmission over error-prone network is described to enable reproducible research results in the field of video quality assessment.
Abstract: In this paper we describe a database containing subjective assessment scores relative to 78 video streams encoded with H.264/AVC and corrupted by simulating the transmission over error-prone network. The data has been collected from 40 subjects at the premises of two academic institutions. Our goal is to provide a balanced and comprehensive database to enable reproducible research results in the field of video quality assessment. In order to support research works on Full-Reference, Reduced-Reference and No-Reference video quality assessment algorithms, both the uncompressed files and the H.264/AVC bitstreams of each video sequence have been made publicly available for the research community, together with the subjective results of the performed evaluations.

146 citations


Journal ArticleDOI
TL;DR: It was found that the proposed type-2 FLC, although it is specifically designed for Internet conditions, can also successfully react to the network conditions of an All-IP network and result in a superior delivered video quality when the control inputs were subject to noise.
Abstract: Intelligent congestion control is vital for encoded video streaming of a clip or film, as network traffic volatility and the associated uncertainties require constant adjustment of the bit rate. Existing solutions, including the standard transmission control protocol (TCP) friendly rate control equation-based congestion controller, are prone to fluctuations in their sending rate and may respond only when packet loss has already occurred. This is a major problem, because both fluctuations and packet loss affect the end-user's perception of the delivered video. A type-1 (T1) fuzzy logic congestion controller (FLC) can operate at video display rates and can reduce packet loss and rate fluctuations, despite uncertainties in measurements of delay arising from congestion and network traffic volatility. However, a T1 FLC employing precise T1 fuzzy sets cannot fully cope with the uncertainties associated with such dynamic network environments. A type-2 FLC using type-2 fuzzy sets can handle such uncertainties to produce improved performance. This paper proposes an interval type-2 FLC that achieves a superior delivered video quality compared with existing traditional controllers and a T1 FLC. To show the response in different network scenarios, tests demonstrate the response both in the presence of typical Internet cross-traffic as well as when other video streams occupy a bottleneck on an All-Internet protocol (IP) network. As All-IP networks are intended for multimedia traffic, it is important to develop a form of congestion control that can transfer to them from the mixed traffic environment of the Internet. It was found that the proposed type-2 FLC, although it is specifically designed for Internet conditions, can also successfully react to the network conditions of an All-IP network. When the control inputs were subject to noise, the type-2 FLC resulted in an order of magnitude performance improvement in comparison with the T1 FLC. The type-2 FLC also showed reduced packet loss when compared with the other controllers, again resulting in superior delivered video quality. When judged by established criteria, such as TCP-friendliness and delayed feedback, fuzzy logic congestion control offers a flexible solution to network bottlenecks. These findings offer the type-2 FLC as a way forward for congestion control of video streaming across packet-switched IP networks.

140 citations


Journal ArticleDOI
Zhengye Liu1, Yanming Shen1, Keith W. Ross1, Shivendra S. Panwar1, Yao Wang1 
TL;DR: LayerP2P combines layered video, mesh P2P distribution, and a tit-for-tat-like algorithm, in a manner such that a peer contributing more upload bandwidth receives more layers and consequently better video quality.
Abstract: Although there are several successful commercial deployments of live P2P streaming systems, the current designs; lack incentives for users to contribute bandwidth resources; lack adaptation to aggregate bandwidth availability; and exhibit poor video quality when bandwidth availability falls below bandwidth supply. In this paper, we propose, prototype, deploy, and validate LayerP2P, a P2P live streaming system that addresses all three of these problems. LayerP2P combines layered video, mesh P2P distribution, and a tit-for-tat-like algorithm, in a manner such that a peer contributing more upload bandwidth receives more layers and consequently better video quality. We implement LayerP2P (including seeds, clients, trackers, and layered codecs), deploy the prototype in PlanetLab, and perform extensive experiments. We also examine a wide range of scenarios using trace-driven simulations. The results show that LayerP2P has high efficiency, provides differentiated service, adapts to bandwidth deficient scenarios, and provides protection against free-riders.

134 citations


Journal ArticleDOI
TL;DR: An automatic key frame extraction method dedicated to summarizing consumer video clips acquired from digital cameras and demonstrates the effectiveness of the method by comparing the results with two alternative methods against the ground truth agreed by multiple judges.
Abstract: Extracting key frames from video is of great interest in many applications, such as video summary, video organization, video compression, and prints from video. Key frame extraction is not a new problem but existing literature has focused primarily on sports or news video. In the personal or consumer video space, the biggest challenges for key frame selection are the unconstrained content and lack of any pre-imposed structures. First, in a psychovisual study, we conduct ground truth collection of key frames from video clips taken by digital cameras (as opposed to camcorders) using both first- and third-party judges. The goals of this study are to: 1) create a reference database of video clips reasonably representative of the consumer video space; 2) identify consensus key frames by which automated algorithms can be compared and judged for effectiveness, i.e., ground truth; and 3) uncover the criteria used by both first- and third-party human judges so these criteria can influence algorithm design. Next, we develop an automatic key frame extraction method dedicated to summarizing consumer video clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides semantically meaningful information about the scene and the camera operator's general intents. In particular, camera and object motion are estimated and used to derive motion descriptors. A video clip is segmented into homogeneous parts based on major types of camera motion (e.g., pan, zoom, pause, steady). Dedicated rules are used to extract candidate key frames from each segment. In addition, confidence measures are computed for the candidates to enable ranking in semantic relevance. This method is scalable so that one can produce any desired number of key frames from the candidates. Finally, we demonstrate the effectiveness of our method by comparing the results with two alternative methods against the ground truth agreed by multiple judges.

127 citations


Proceedings ArticleDOI
24 Sep 2009
TL;DR: Different enhancements to the model are presented, allowing a much better approximation to the perceptual MOS values, knowing only the subjective movement content in the video application, classified in "Low", "Medium" or "High".
Abstract: In this paper, we show how the proposed model in ITU-T Recommendation G.1070 "Opinion model for video-telephony applications" cannot model properly the perceptual video quality, especially in the low bit rate range, due to the great variation of MOS values depending on video content. In this work, we present different enhancements to the model, allowing a much better approximation to the perceptual MOS values, knowing only the subjective movement content in the video application, classified in "Low", "Medium" or "High". Studies were made for more than 1500 processed video clips, coded in MPEG-2 and H.264/AVC, in bit rate ranges from 50 kb/s to 12 Mb/s, in SD, VGA, CIF and QCIF display formats. Video clips subjective quality was estimated using one of the quality metrics standardized in ITU-T Recommendation J.144 and ITU-R Recommendation BT.1683.

105 citations


Proceedings ArticleDOI
06 Dec 2009
TL;DR: The second version of Microsoft Research Asia Multimedia (MSRA-MM) is introduced, a dataset that aims to facilitate research in multimedia information retrieval and related areas and defines six standard tasks on the dataset.
Abstract: In this paper, we introduce the second version of Microsoft Research Asia Multimedia (MSRA-MM), a dataset that aims to facilitate research in multimedia information retrieval and related areas. The images and videos in the dataset are collected from a commercial search engine with more than 1000 queries. It contains about 1 million images and 20,000 videos. We also provide the surrounding texts that are obtained from more than 1 million web pages. The images and videos have been comprehensively annotated, including their relevance levels to corresponding queries, semantic concepts of images, and category and quality information of videos. We define six standard tasks on the dataset: (1) image search reranking; (2) image annotation; (3) query-by-example image search; (4) video search reranking; (5) video categorization; and (6) video quality assessment.

104 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: This paper presents a two step approach to video quality prediction, where video sequences are classified into groups representing different content types using cluster analysis and video quality is predicted from network level parameter and application level parameters using Principal Component Analysis (PCA).
Abstract: The aim of this paper is quality prediction for streaming MPEG4 video sequences over wireless networks for all video content types. Video content has an impact on video quality under same network conditions. This feature has not been widely explored when developing reference-free video quality prediction model for streaming video over wireless or mobile communications. In this paper, we present a two step approach to video quality prediction. First, video sequences are classified into groups representing different content types using cluster analysis. The classification of contents is based on the temporal (movement) and spatial (edges, brightness) feature extraction. Second, based on the content type, video quality (in terms of Mean Opinion Score) is predicted from network level parameter (packet error rate) and application level (i.e. send bitrate, frame rate) parameters using Principal Component Analysis (PCA). The performance of the developed model is evaluated with unseen datasets and good prediction accuracy is obtained for all content types. The work can help in the development of reference-free video prediction model and priority control for content delivery networks.

Patent
16 Sep 2009
TL;DR: In this article, a video calling device resides functionally inline between a set-top box and a television set and can provide, in some cases, high performance video calling, high video quality, simplified installation, configuration and/or use, and ability to enjoy video calling in an inclusive, comfortable environment, such as a family room, den, or media room.
Abstract: Novel tools and techniques for providing video calling solutions. In some such solutions, a video calling device resides functionally inline between a set-top box and a television set. Such solutions can provide, in some cases, high performance video calling, high video quality, simplified installation, configuration and/or use, and/or the ability to enjoy video calling in an inclusive, comfortable environment, such as a family room, den, or media room.

Journal ArticleDOI
TL;DR: This work proposes video-aware opportunistic network coding schemes that take into account both the decodability of network codes by several receivers and the importance and deadlines of video packets, and shows that these schemes significantly improve both video quality and throughput.
Abstract: In this paper, we study video streaming over wireless networks with network coding capabilities. We build upon recent work, which demonstrated that network coding can increase throughput over a broadcast medium, by mixing packets from different flows into a single packet, thus increasing the information content per transmission. Our key insight is that, when the transmitted flows are video streams, network codes should be selected so as to maximize not only the network throughput but also the video quality. We propose video-aware opportunistic network coding schemes that take into account both the decodability of network codes by several receivers and the importance and deadlines of video packets. Simulation results show that our schemes significantly improve both video quality and throughput. This work is a first step towards content-aware network coding.

Journal ArticleDOI
TL;DR: The authors describe the visual impairments that result from such packet losses and present the results of testing and analysis to compare impairments for different loss durations for both MPEG-2-encoded standard and high-definition services.
Abstract: For pt. 1 see ibid., vol. 13, no. 1, p.70-5 (2009). In this second part of a two-part article, the authors highlight the impact that different durations of IP packet loss have on the quality of experience for IP-based video streaming services. They describe the visual impairments that result from such packet losses and present the results of testing and analysis to compare impairments for different loss durations for both MPEG-2-encoded standard and high-definition services.

Journal ArticleDOI
TL;DR: A framework that adds a temporal distortion awareness to typical video quality measurement algorithms and shows that the processing steps and the signal representations that are generated by the algorithm follow the reasoning of a human observer in a subjective experiment is presented.
Abstract: The measurement of video quality for lossy and low-bitrate network transmissions is a challenging topic. Especially, the temporal artifacts which are introduced by video transmission systems and their effects on the viewer's satisfaction have to be addressed. This paper focuses on a framework that adds a temporal distortion awareness to typical video quality measurement algorithms. A motion estimation is used to track image areas over time. Based on the motion vectors and the motion prediction error, the appearance of new image areas and the display time of objects is evaluated. Additionally, degradations which stick to moving objects can be judged more exactly. An implementation of this framework for multimedia sequences, e.g., QCIF, CIF, or VGA resolution, is presented in detail. It shows that the processing steps and the signal representations that are generated by the algorithm follow the reasoning of a human observer in a subjective experiment. The improvements that can be achieved with the newly proposed algorithm are demonstrated using the results of the Multimedia Phase I database of the Video Quality Experts Group.

Journal ArticleDOI
TL;DR: In this paper, an analytical framework for optimal rate allocation based on observed available bit rate (ABR) and round-trip time (RTT) over each access network and video distortion-rate (DR) characteristics is proposed.
Abstract: We consider the problem of rate allocation among multiple simultaneous video streams sharing multiple heterogeneous access networks. We develop and evaluate an analytical framework for optimal rate allocation based on observed available bit rate (ABR) and round-trip time (RTT) over each access network and video distortion-rate (DR) characteristics. The rate allocation is formulated as a convex optimization problem that minimizes the total expected distortion of all video streams. We present a distributed approximation of its solution and compare its performance against Hinfin-optimal control and two heuristic schemes based on TCP-style additive-increase-multiplicative-decrease (AIMD) principles. The various rate allocation schemes are evaluated in simulations of multiple high-definition (HD) video streams sharing multiple access networks. Our results demonstrate that, in comparison with heuristic AIMD-based schemes, both media-aware allocation and Hinfin-optimal control benefit from proactive congestion avoidance and reduce the average packet loss rate from 45% to below 2%. Improvement in average received video quality ranges between 1.5 to 10.7 dB in PSNR for various background traffic loads and video playout deadlines. Media-aware allocation further exploits its knowledge of the video DR characteristics to achieve a more balanced video quality among all streams.

Proceedings ArticleDOI
29 Jul 2009
TL;DR: This paper considers natural scenes statistics and adopt multi-resolution decomposition methods to extract reliable features for QA in no-reference image and video blur assessment and shows the algorithm has high correlation with human judgment in assessing blur distortion of images.
Abstract: The increasing number of demanding consumer video applications, as exemplified by cell phone and other low-cost digital cameras, has boosted interest in no-reference objective image and video quality assessment (QA). In this paper, we focus on no-reference image and video blur assessment. There already exist a number of no-reference blur metrics, but most are based on evaluating the widths of intensity edges, which may not reflect real image quality in many circumstances. Instead, we consider natural scenes statistics and adopt multi-resolution decomposition methods to extract reliable features for QA. First, a probabilistic support vector machine (SVM) is applied as a rough image quality evaluator; then the detail image is used to refine and form the final blur metric. The algorithm is tested on the LIVE Image Quality Database; the results show the algorithm has high correlation with human judgment in assessing blur distortion of images.

Journal ArticleDOI
TL;DR: The performance analysis of ROIAS is presented in terms of the impact on user perceived video quality measured using subjective video quality assessment techniques based on human subjects and the benefit of using ROIAS for adaptive video quality delivery is demonstrated.
Abstract: Adaptive multimedia streaming relies on adjusting the video content's bit-rate meet network conditions in the quest to reduce packet loss and resulting video quality degradations. Current multimedia adaptation schemes uniformly adjust the compression over the entire image area. However, research has shown that user attention is focused mostly on certain image areas, denoted areas of maximum user interest (AMUI), and their interest decreases with the increase in distance to the AMUI. The region of interest-based adaptive multimedia streaming scheme (ROIAS) is introduced to perform bit-rate adaptation to network conditions by adjusting video quality relative to the AMUI location. This paper also extends ROIAS to support multiple areas of maximum user interest within the same video frame. This paper presents the performance analysis of ROIAS in terms of the impact on user perceived video quality measured using subjective video quality assessment techniques based on human subjects. The tests use a wide range of video clips, which differ in terms of spatial and temporal complexity and region of interest location and variation. A comparative evaluation of both subjective and objective video quality test results is performed and demonstrate the benefit of using ROIAS for adaptive video quality delivery.

Proceedings ArticleDOI
29 Jul 2009
TL;DR: Using numerical simulations as well as real-life data from a multitude of subjective experiments, this paper attempts to shed more light on the distribution and variability of subjective ratings, the effects of discrete rating scales, and the number of subjects needed.
Abstract: Subjective video quality experiments are classical statistical measurements, and as such the mathematical tools for their analysis are well understood. However, there remain certain practical aspects that are rarely discussed, yet are essential to the design of efficient experiments. Using numerical simulations as well as real-life data from a multitude of subjective experiments, this paper attempts to shed more light on the distribution and variability of subjective ratings, the effects of discrete rating scales, and the number of subjects needed.

Proceedings ArticleDOI
19 Oct 2009
TL;DR: An approach for extracting visual attention regions based on a combination of a bottom-up saliency model and semantic image analysis and a novel quality metric is proposed which can exploit the attributes of visual attention information adequately.
Abstract: Most existing quality metrics do not take the human attention analysis into account. Attention to particular objects or regions is an important attribute of human vision and perception system in measuring perceived image and video qualities. This paper presents an approach for extracting visual attention regions based on a combination of a bottom-up saliency model and semantic image analysis. The use of PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural SIMilarity) in extracted attention regions is analyzed for image/video quality assessment, and a novel quality metric is proposed which can exploit the attributes of visual attention information adequately. The experimental results with respect to the subjective measurement demonstrate that the proposed metric outperforms the current methods.

Journal ArticleDOI
TL;DR: This paper examines a strategy for maximizing the network lifetime in wireless visual sensor networks by jointly optimizing the source rates, the encoding powers, and the routing scheme and demonstrates that the proposed algorithm can achieve a much longer network lifetime compared to the scheme optimized for the conventional wireless sensor networks.
Abstract: Network lifetime maximization is a critical issue in wireless sensor networks since each sensor has a limited energy supply. In contrast with conventional sensor networks, video sensor nodes compress the video before transmission. The encoding process demands a high power consumption, and thus raises a great challenge to the maintenance of a long network lifetime. In this paper, we examine a strategy for maximizing the network lifetime in wireless visual sensor networks by jointly optimizing the source rates, the encoding powers, and the routing scheme. Fully distributed algorithms are developed using the Lagrangian duality to solve the lifetime maximization problem. We also examine the relationship between the collected video quality and the maximal network lifetime. Through extensive numerical simulations, we demonstrate that the proposed algorithm can achieve a much longer network lifetime compared to the scheme optimized for the conventional wireless sensor networks.

Journal ArticleDOI
TL;DR: A novel data hiding method in the compressed video domain that completely preserves the image quality of the host video while embedding information into it and is also reversible, where the embedded information could be removed to obtain the original video.
Abstract: Although many data hiding methods are proposed in the literature, all of them distort the quality of the host content during data embedding. In this paper, we propose a novel data hiding method in the compressed video domain that completely preserves the image quality of the host video while embedding information into it. Information is embedded into a compressed video by simultaneously manipulating Mquant and quantized discrete cosine transform coefficients, which are the significant parts of MPEG and H.26x-based compression standards. To the best of our knowledge, this data hiding method is the first attempt of its kind. When fed into an ordinary video decoder, the modified video completely reconstructs the original video even compared at the bit-to-bit level. Our method is also reversible, where the embedded information could be removed to obtain the original video. A new data representation scheme called reverse zerorun length (RZL) is proposed to exploit the statistics of macroblock for achieving high embedding efficiency while trading off with payload. It is theoretically and experimentally verified that RZL outperforms matrix encoding in terms of payload and embedding efficiency for this particular data hiding method. The problem of video bitstream size increment caused by data embedding is also addressed, and two independent solutions are proposed to suppress this increment. Basic performance of this data hiding method is verified through experiments on various existing MPEG-1 encoded videos. In the best case scenario, an average increase of four bits in the video bitstream size is observed for every message bit embedded.

Proceedings ArticleDOI
08 Dec 2009
TL;DR: This paper reviews some of the existing standards documents, on-going work and future trends in this space.
Abstract: With the wide-spread use of digital video, quality considerations have become essential, and industry demand for video quality measurement standards is rising. Various organizations are working on such standards. This paper reviews some of the existing standards documents, on-going work and future trends in this space.

Patent
Min Dai1, Tao Xue1, Chia-Yuan Teng1
29 Jul 2009
TL;DR: In this article, intelligent frame skipping techniques that may be used by an encoding device or a decoding device to facilitate frame skipping in a manner that may help to minimize quality degradation due to the frame skipping are described.
Abstract: This disclosure provides intelligent frame skipping techniques that may be used by an encoding device or a decoding device to facilitate frame skipping in a manner that may help to minimize quality degradation due to the frame skipping. In particular, the described techniques may implement a similarity metric designed to identify good candidate frames for frame skipping. In this manner, noticeable reductions in the video quality caused by frame skipping, as perceived by a viewer of the video sequence, may be reduced relative to conventional frame skipping techniques. The described techniques advantageously operate in a compressed domain.

Journal ArticleDOI
TL;DR: The proposed NORM (NO-Reference video quality Monitoring), an algorithm to assess the quality degradation of H.264/AVC video affected by channel errors, provides an estimate of the mean square error distortion at the macroblock level, showing good linear correlation.
Abstract: When video is transmitted over a packet-switched network, the sequence reconstructed at the receiver side might suffer from impairments introduced by packet losses, which can only be partially healed by the action of error concealment techniques. In this context we propose NORM (NO-Reference video quality Monitoring), an algorithm to assess the quality degradation of H.264/AVC video affected by channel errors. NORM works at the receiver side where both the original and the uncorrupted video content is unavailable. We explicitly account for distortion introduced by spatial and temporal error concealment together with the effect of temporal motion-compensation. NORM provides an estimate of the mean square error distortion at the macroblock level, showing good linear correlation (correlation coefficient greater than 0.80) with the distortion computed in full-reference mode. In addition, the estimate at the macroblock level can be successfully exploited by forward quality monitoring systems that compute quality objective metrics to predict mean opinion score (MOS) values. As a proof of concept, we feed the output of NORM to a reduced-reference quality monitoring system that computes an estimate of the structural similarity metric (SSIM) score, which is known to be well correlated with perceptual quality.

Journal ArticleDOI
TL;DR: The main aim of this paper is the prediction of video quality combining the application and network level parameters for all content types and confirmed that the video quality is more sensitive to network level compared to application level parameters.
Abstract: There are many parameters that affect video quality but their combined effect is not well identified and understood when video is transmitted over mobile/ wireless networks. In addition, video content has an impact on video quality under same network conditions. The main aim of this paper is the prediction of video quality combining the application and network level parameters for all content types. Firstly, video sequences are classified into groups representing different content types using cluster analysis. The classification of contents is based on the temporal (movement) and spatial (edges, brightness) feature extraction. Second, to study and analyze the behaviour of video quality for wide range variations of a set of selected parameters. Finally, to develop two learning models based on – (1) ANFIS to estimate the visual perceptual quality in terms of the Mean Opinion Score (MOS) and decodable frame rate (Q value) and (2) regression modeling to estimate the visual perceptual quality in terms of the MOS. We trained three ANFIS-based ANNs and regression based- models for the three distinct content types using a combination of network and application level parameters and tested the two models using unseen dataset. We confirmed that the video quality is more sensitive to network level compared to application level parameters. Preliminary results show that a good prediction accuracy was obtained from both models. However, the regression based model performed better in terms of the correlation coefficient and the root mean squared error. The work should help in the development of a reference-free video prediction model and Quality of Service (QoS) control methods for video over wireless/mobile networks.

Journal ArticleDOI
TL;DR: VQEG's work is reviewed, paying particular attention to the group's approach to validation testing of objective perceptual quality models.
Abstract: For industry, the need to access accurate and reliable objective video metrics has become more pressing with the advent of new video applications and services such as mobile broadcasting, Internet video, and Internet Protocol television (IPTV). Industry-class objective quality- measurement models have a wide range of uses, including equipment testing (e.g., codec evaluation), transmission- planning and network-dimensioning tasks, head-end quality assurance, in- service network monitoring, and client-based quality measurement. The Video Quality Experts Group (VQEG) is the primary forum for validation testing of objective perceptual quality models. The work of VQEG has resulted in International Telecommunication Union (ITU) standardization of objective quality models designed for standard- definition television and for multimedia applications. This article reviews VQEG's work, paying particular attention to the group's approach to validation testing.

Journal ArticleDOI
TL;DR: A network-coding-based cooperative repair framework for the ad-hoc peer group to improve broadcast video quality during channel losses by first imposing network coding structures globally, and then selecting the appropriate video streams and network coding types within the structures locally so that repair can be optimized for broadcast video in a rate-distortion manner.
Abstract: In a scenario where each peer of an ad-hoc wireless local area network (WLAN) receives one of many available video streams from a wireless wide area network (WWAN), we propose a network-coding-based cooperative repair framework for the ad-hoc peer group to improve broadcast video quality during channel losses. Specifically, we first impose network coding structures globally, and then select the appropriate video streams and network coding types within the structures locally, so that repair can be optimized for broadcast video in a rate-distortion manner. Innovative probability-the likelihood that a repair packet is useful in data recovery to a receiving peer-is analyzed in this setting for accurate optimization of the network codes. Our simulation results show that by using our framework, video quality can be improved by up to 19.71 dB over un-repaired video stream and by up to 5.39 dB over video stream using traditional unstructured network coding.

Journal ArticleDOI
TL;DR: Experimental results show that when multiview video is compressed with joint multIView video model, the proposed method increases compression efficiency by up to 1.0 dB in luma peak signal-to-noise ratio (PSNR) compared to compressing the original uncorrected video.
Abstract: In multiview video, a number of cameras capture the same scene from different viewpoints. There can be significant variations in the color of views captured with different cameras, which negatively affects performance when the videos are compressed with inter-view prediction. In this letter, a method is proposed for correcting the color of multiview video sets as a preprocessing step to compression. Unlike previous work, where one of the captured views is used as the color reference, we correct all views to match the average color of the set of views. Block-based disparity estimation is used to find matching points between all views in the video set, and the average color is calculated for these matching points. A least-squares regression is performed for each view to find a function that will make the view most closely match the average color. Experimental results show that when multiview video is compressed with joint multiview video model, the proposed method increases compression efficiency by up to 1.0 dB in luma peak signal-to-noise ratio (PSNR) compared to compressing the original uncorrected video.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: A scalable, lightweight, no-reference framework to infer video QoE, which shows that the MOS predictions are in close agreement with subjective perceptions and an implementation of the framework on standard Linux PC shows it can compute 20 MOS calculations per second with 3 parameters and 18 partitions of the QeE space.
Abstract: We present a scalable, lightweight, no-reference framework to infer video QoE. Our framework revolves around a one time offline construction of a k-dimensional space, which we call the QoE space. The k-dimensions accommodate k parameters (network-dependent/independent) that potentially affect video quality. The k-dimensional space is partitioned to N representative zones, each with a QoE index. Instantaneous parameter values are matched with the indices to infer QoE. To validate our framework, we construct a 3-dimensional QoE space with bit-rate, loss, and delay as the principal components. We create 18 video samples with unique combinations of the 3 parameters. 77 human subjects rated these video samples on a scale of 1 to 5 to create the QoE space. In a second set of survey, our predicted MOS was compared to 49 human responses. Results show that our MOS predictions are in close agreement with subjective perceptions. An implementation of our framework on standard Linux PC shows we can compute 20 MOS calculations per second with 3 parameters and 18 partitions of the QoE space.