scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Deep Blind Video Quality Assessment Based on Temporal Human Perception

29 Aug 2018-pp 619-623
TL;DR: A deep learning scheme named Deep Blind Video Quality Assessment (DeepBVQA) is proposed to achieve a more accurate and reliable video quality predictor by considering various spatial and temporal cues which have not been considered before.
Abstract: The high performance video quality assessment (VQA) algorithm is a necessary skill to provide high quality video to viewers. However, since the nonlinear perception function between the distortion level of the video and the subjective quality score is not precisely defined, there are many limitations in accurately predicting the quality of the video. In this paper, we propose a deep learning scheme named Deep Blind Video Quality Assessment (DeepBVQA) to achieve a more accurate and reliable video quality predictor by considering various spatial and temporal cues which have not been considered before. We used CNN to extract the spatial cues of each video in VQA and proposed new hand-crafted features for temporal cues. Performance experiments show that performance is better than other state-of-the-art no-reference (NR) VQA models and the introduction of hand-crafted temporal features is very efficient in VQA.
Citations
More filters
Journal ArticleDOI
TL;DR: A new subjective resource, called the LIVE-YouTube-H FR (LIVE-YT-HFR) dataset, which is comprised of 480 videos having 6 different frame rates, obtained from 16 diverse contents, and is made available online for public use and evaluation purposes.
Abstract: High frame rate (HFR) videos are becoming increasingly common with the tremendous popularity of live, high-action streaming content such as sports. Although HFR contents are generally of very high quality, high bandwidth requirements make them challenging to deliver efficiently, while simultaneously maintaining their quality. To optimize trade-offs between bandwidth requirements and video quality, in terms of frame rate adaptation, it is imperative to understand the intricate relationship between frame rate and perceptual video quality. Towards advancing progression in this direction we designed a new subjective resource, called the LIVE-YouTube-HFR (LIVE-YT-HFR) dataset, which is comprised of 480 videos having 6 different frame rates, obtained from 16 diverse contents. In order to understand the combined effects of compression and frame rate adjustment, we also processed videos at 5 compression levels at each frame rate. To obtain subjective labels on the videos, we conducted a human study yielding 19,000 human quality ratings obtained from a pool of 85 human subjects. We also conducted a holistic evaluation of existing state-of-the-art Full and No-Reference video quality algorithms, and statistically benchmarked their performance on the new database. The LIVE-YT-HFR database has been made available online for public use and evaluation purposes, with hopes that it will help advance research in this exciting video technology direction. It may be obtained at https://live.ece.utexas.edu/research/LIVE_YT_HFR/LIVE_YT_HFR/index.html .

32 citations

Proceedings ArticleDOI
Anh-Duc Nguyen1, Seonghwa Choi1, Woojae Kim1, Sewoong Ahn1, Jinwoo Kim1, Sanghoon Lee1 
17 Sep 2019
TL;DR: Through extensive experiments on image classification and style transfer using different architectures, it is demonstrated that the proposed padding technique consistently outperforms the default zero padding and hence can be a potential candidate for its replacement.
Abstract: Even though zero padding is usually a staple in convolutional neural networks to maintain the output size, it is highly suspicious because it significantly alters the input distribution around border region. To mitigate this problem, in this paper, we propose a new padding technique termed as distribution padding. The goal of the method is to approximately maintain the statistics of the input border regions. We introduce two different ways to achieve our goal. In both approaches, the padded values are derived from the means of the border patches, but those values are handled in a different way in each variant. Through extensive experiments on image classification and style transfer using different architectures, we demonstrate that the proposed padding technique consistently outperforms the default zero padding, and hence can be a potential candidate for its replacement.

18 citations

Journal ArticleDOI
01 Mar 2022-Sensors
TL;DR: A novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks to characterize versatitely the potential image and video distortions to set a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions.
Abstract: With the constantly growing popularity of video-based services and applications, no-reference video quality assessment (NR-VQA) has become a very hot research topic. Over the years, many different approaches have been introduced in the literature to evaluate the perceptual quality of digital videos. Due to the advent of large benchmark video quality assessment databases, deep learning has attracted a significant amount of attention in this field in recent years. This paper presents a novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks (CNN) to characterize versatitely the potential image and video distortions. Specifically, temporally pooled and saliency weighted video-level deep features are extracted with the help of a set of pre-trained CNNs and mapped onto perceptual quality scores independently from each other. Finally, the quality scores coming from the different regressors are fused together to obtain the perceptual quality of a given video sequence. Extensive experiments demonstrate that the proposed method sets a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions. Moreover, the presented results underline that the decision fusion of multiple deep architectures can significantly benefit NR-VQA.

10 citations

Journal ArticleDOI
TL;DR: Sissuire et al. as discussed by the authors proposed HEKE, a feature encoder specific to blind video quality assessment (BVQA), to extract spatio-temporal representation from videos.
Abstract: Blind video quality assessment (BVQA) is of great importance for video-related applications, yet still challenging even in this deep learning era. The difficulty lies in the shortage of large-scale labeled data, thus making it hard to train a robust spatiotemporal encoder for BVQA. To relieve such difficulty, we first build a video dataset, which contains over 320K samples suffering from various compression and transmission artifacts. While manually annotating the dataset with subjective perception is much labor-intensive and time-consuming, we adopt reference-based VQA algorithms to weakly label the data automatically. We consider that single weak label is derived from single knowledge, which is deficient and incomplete for VQA. To alleviate the bias from single weak label (i.e., single knowledge) in the weakly labeled dataset, we propose HEterogeneous Knowledge Ensemble (HEKE) for spatiotemporal representation learning. Compared to learning from single knowledge, learning with HEKE is thought to achieve a lower infimum theoretically, and obtain richer representation. On the basis of the built dataset and the HEKE methodology, a feature encoder specific to BVQA is formed, and directly extract spatiotemporal representation from videos. Then, the video quality can be either acquired in a completely BVQA manner without ground truth, or via a finetuning-based regressor with labels. Extensive experiments on various VQA databases show that our BVQA model with the pretrained encoder achieves the state-of-the-art performance. More surprisingly, even trained on the synthetic data, our model still shows competitive performance on authentic databases. The data and source code will be available at https://github.com/Sissuire/BVQA-HEKE .

10 citations

Journal ArticleDOI
TL;DR: In this article, a framework using convolutional neural networks is proposed to predict spatial and temporal entropic differences without the need for a reference or human opinion score, which enables the model to capture both spatial-temporal distortions effectively and allows for robust generalization.
Abstract: We consider the problem of robust no reference (NR) video quality assessment (VQA) where the algorithms need to have good generalization performance when they are trained and tested on different datasets. We specifically address this question in the context of predicting video quality for compression and transmission applications. Motivated by the success of the spatio-temporal entropic differences video quality predictor in this context, we design a framework using convolutional neural networks to predict spatial and temporal entropic differences without the need for a reference or human opinion score. This approach enables our model to capture both spatial and temporal distortions effectively and allows for robust generalization. We evaluate our algorithms on a variety of datasets and show superior cross database performance when compared to state of the art NR VQA algorithms.

10 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Journal ArticleDOI
TL;DR: This work has recently derived a blind IQA model that only makes use of measurable deviations from statistical regularities observed in natural images, without training on human-rated distorted images, and, indeed, without any exposure to distorted images.
Abstract: An important aim of research on the blind image quality assessment (IQA) problem is to devise perceptual models that can predict the quality of distorted images with as little prior knowledge of the images or their distortions as possible. Current state-of-the-art “general purpose” no reference (NR) IQA algorithms require knowledge about anticipated distortions in the form of training examples and corresponding human opinion scores. However we have recently derived a blind IQA model that only makes use of measurable deviations from statistical regularities observed in natural images, without training on human-rated distorted images, and, indeed without any exposure to distorted images. Thus, it is “completely blind.” The new IQA model, which we call the Natural Image Quality Evaluator (NIQE) is based on the construction of a “quality aware” collection of statistical features based on a simple and successful space domain natural scene statistic (NSS) model. These features are derived from a corpus of natural, undistorted images. Experimental results show that the new index delivers performance comparable to top performing NR IQA models that require training on large databases of human opinions of distorted images. A software release is available at http://live.ece.utexas.edu/research/quality/niqe_release.zip.

3,722 citations


"Deep Blind Video Quality Assessment..." refers methods in this paper

  • ...Taking averages is often used to analyze representative characteristics of global quality in QA problems [8], and standard deviation was used to analyze the global variation of local quality [12]....

    [...]

Journal ArticleDOI
TL;DR: An efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics model of discrete cosine transform (DCT) coefficients, which requires minimal training and adopts a simple probabilistic model for score prediction.
Abstract: We develop an efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics (NSS) model of discrete cosine transform (DCT) coefficients. The algorithm is computationally appealing, given the availability of platforms optimized for DCT computation. The approach relies on a simple Bayesian inference model to predict image quality scores given certain extracted features. The features are based on an NSS model of the image DCT coefficients. The estimated parameters of the model are utilized to form features that are indicative of perceptual quality. These features are used in a simple Bayesian inference approach to predict quality scores. The resulting algorithm, which we name BLIINDS-II, requires minimal training and adopts a simple probabilistic model for score prediction. Given the extracted features from a test image, the quality score that maximizes the probability of the empirically determined inference model is chosen as the predicted quality score of that image. When tested on the LIVE IQA database, BLIINDS-II is shown to correlate highly with human judgments of quality, at a level that is competitive with the popular SSIM index.

1,484 citations


"Deep Blind Video Quality Assessment..." refers result in this paper

  • ...Recent IQA studies [3]-[7] have resulted in higher performance improvements compared to earlier studies [8]....

    [...]

Journal ArticleDOI
TL;DR: The independent test results from the VQEG FR-TV Phase II tests are summarized, as well as results from eleven other subjective data sets that were used to develop the NTIA General Model.
Abstract: The National Telecommunications and Information Administration (NTIA) General Model for estimating video quality and its associated calibration techniques were independently evaluated by the Video Quality Experts Group (VQEG) in their Phase II Full Reference Television (FR-TV) test. The NTIA General Model was the only video quality estimator that was in the top performing group for both the 525-line and 625-line video tests. As a result, the American National Standards Institute (ANSI) adopted the NTIA General Model and its associated calibration techniques as a North American Standard in 2003. The International Telecommunication Union (ITU) has also included the NTIA General Model as a normative method in two Draft Recommendations. This paper presents a description of the NTIA General Model and its associated calibration techniques. The independent test results from the VQEG FR-TV Phase II tests are summarized, as well as results from eleven other subjective data sets that were used to develop the method.

1,268 citations

Journal ArticleDOI
TL;DR: In this paper, the reciprocal nature of these spatio-temporal interactions can be particularly clearly expressed if the threshold contrast is determined for a grating target whose luminance perpendicular to the bars is given by where m is the contrast, v the spatial frequency, and ƒ the temporal frequency of the target.
Abstract: T HE dependence of the form of the spatial contrast-sensitivity function for a square-wave test grating upon the duration of exposure of the target has been investigated by Schober and Hilz. 1 Kelly 2 has pointed out an analogous dependence of the form of the temporal contrast (modulation) sensitivity function upon the angular extent of the test target. The reciprocal nature of these spatio-temporal interactions can be particularly clearly ap­ preciated if the threshold contrast is determined for a grating target whose luminance perpendicular to the bars is given by where m is the contrast, v the spatial frequency, and ƒ the temporal frequency of the target. FIG. 2. Temporal contrast-sensitivity (reciprocal of threshold contrast) functions for different spatial frequencies. The points are the means of four measurements and the curves (two with dashed low-frequency sections) differ only in their positions along the contrast-sensitivity scale, O 0.5 cycle per degree, ● 4, ∆ 16, ▲ 22 cycles per degree. FIG. 1. Spatial contrast-sensitivity (reciprocal of threshold contrast) functions for different temporal frequencies. The points are the means of four measurements and the curves (one with a dashed low-frequency section) differ only in their positions along the contrast-sensitivity scale O 1 cycle per second, ● 6, ∆ 16, ▲ 22 cycles per second. Such a pattern was set up as a display on a cathode-ray oscil­ loscope and Figs. 1 and 2 show the results of threshold-contrast measurements made by the author (a well-corrected myope). Viewing was binocular at a distance of 2 m. The grating pattern subtended 2.5°×2.5° in the center of a 10°× 10° screen illuminated to the same mean luminance of 20 cd/m 2. The general similarity of the two sets of contrast-sensitivity functions is immediately evident but two features are particularly remarkable. First, the form of the fall-off in sensitivity at high

861 citations


"Deep Blind Video Quality Assessment..." refers background in this paper

  • ...In conclusion, we use the temporal sharpness variation features because it reflects the human visual system characteristics of temporal variation CSF of frame sharpness....

    [...]

  • ...As the temporal variation of frame quality increases, the contrast of the time domain in the temporal CSF [13] increases, so the human recognizes the noise in the video....

    [...]