scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

No-Reference Video Quality Assessment based on Convolutional Neural Network and Human Temporal Behavior

01 Nov 2018-pp 1513-1517
TL;DR: A deep learning scheme named Deep Blind Video Quality Assessment is proposed to achieve a more accurate and reliable video quality predictor by considering various spatial and temporal cues which have not been considered before.
Abstract: The high performance video quality assessment (VQA) algorithm is a necessary skill to provide high quality video to viewers. However, since the nonlinear perception function between the distortion level of the video and the subjective quality score is not precisely defined, there are many limitations in accurately predicting the quality of the video. In this paper, we propose a deep learning scheme named Deep Blind Video Quality Assessment to achieve a more accurate and reliable video quality predictor by considering various spatial and temporal cues which have not been considered before. We used CNN to extract the spatial cues of each video in VQA and proposed new hand-crafted features for temporal cues. Performance experiments show that performance is better than other state-of-the-art no-reference (NR) VQA models and the introduction of hand-crafted temporal features is very efficient in VQA.
Citations
More filters
Proceedings ArticleDOI
01 Jan 2020
TL;DR: This paper introduces a novel feature extraction method for no-reference video quality assessment (NR-VQA) relying on visual features extracted from multiple Inception modules of pretrained convolutional neural networks (CNN).

5 citations


Cites methods or result from "No-Reference Video Quality Assessme..."

  • ...As a consequence, some NR-VQA methods rely on features extracted from pretrained CNNs (Ahn and Lee, 2018b), (Ahn and Lee, 2018a), (Varga, 2019)....

    [...]

  • ...Namely, previous works (Ahn and Lee, 2018b), (Ahn and Lee, 338 Varga, D. Multi-pooled Inception Features for No-reference Video Quality Assessment....

    [...]

  • ...Furthermore, previous methods (Ahn and Lee, 2018b), (Ahn and Lee, 2018a), (Varga, 2019) analyze all video frames one by one to predict perceptual video quality....

    [...]

  • ...Namely, previous works (Ahn and Lee, 2018b), (Ahn and Lee,...

    [...]

Journal ArticleDOI
TL;DR: In this article, a real-time no-reference video quality assessment (VQA) method is proposed for videos encoded by H.264/AVC codec, where temporal and spatial features are extracted from the encoded bitstream and pixel values to train and validate a fully connected neural network.
Abstract: The ever-growing video streaming services require accurate quality assessment with often no reference to the original media. One primary challenge in developing no-reference (NR) video quality metrics is achieving real-timeliness while retaining the accuracy. A real-time no-reference video quality assessment (VQA) method is proposed for videos encoded by H.264/AVC codec. Temporal and spatial features are extracted from the encoded bit-stream and pixel values to train and validate a fully connected neural network. The hand-crafted features and network dynamics are designed in a manner to ensure a high correlation with human judgment of quality as well as minimizing the computational complexities. Proof-of-concept experiments are conducted via comparison with: 1) video sequences rated by a full-reference quality metric, and 2) H.264-encoded sequences from the LIVE video dataset which are subjectively evaluated through differential mean opinion scores (DMOS). The performance of the proposed method is verified by correlation measurements with the aforementioned objective and subjective scores. The framework achieves real-time execution while outperforming state-of-art full-reference and no-reference video quality assessment methods.

1 citations

Journal ArticleDOI
TL;DR: Results of the subjective experiments indicate that it may be possible to use information gained from older datasets to describe the perceived quality of more recent compression algorithms.
Abstract: There is a continuing demand for objective measures that predict perceived media quality. Researchers are developing new methods for mapping technical parameters of digital media to the perceived quality. It is quite common to use machine learning algorithms for these purposes especially deep learning algorithms, which need large amounts of data for training. In this paper, we aim towards getting more training data with recent types of distortions. Instead of doing expensive subjective experiments, we evaluate the reuse of previously published, well-known image datasets with subjective annotation. In this contribution, the procedure of mapping Mean Opinion Scores (MOS) from an already published subjectively annotated dataset with older codecs to new codecs is presented. In particular, we map from Joint Photographic Experts Group (JPEG) distortions to newer High Efficiency Video Coding (HEVC) distortions. We have used values of three different objective methods as a connection between these two different distortion types. In order to investigate the significance of our approach, subjective verification tests were designed and conducted. The design goals led to two types of experiments, i.e. Pair Comparison (PC) test and Absolute Category Rating (ACR) test, in which 40 participants provided their opinion. Results of the subjective experiments indicate that it may be possible to use information gained from older datasets to describe the perceived quality of more recent compression algorithms.

1 citations

Journal ArticleDOI
TL;DR: A no-reference framework is proposed for the video quality estimation streamed through the wireless network and demonstrates a strong correlation with subjective evaluation of the two separate video databases as compared with other existing algorithms.
Abstract: In this work, a no-reference framework is proposed for the video quality estimation streamed through the wireless network. The work presents a comprehensive survey of the existing full reference (FR), reduced reference (RR), and no-reference (NR) algorithms. A comparison has been made among existing algorithms, i.e. in terms of subjective correlation and feasibility to use these algorithms in wireless architecture, to describe the necessity of the proposed framework to overcome the limitations of the existing algorithms. A brief summary of our previously published algorithms, i.e. NR blockiness, NR blur, NR network, NR just noticeable distortion, and RR, has also been presented. These algorithms have also been used as function modules in the proposed framework. The proposed framework is able to measure the video quality by taking into account major spatial, temporal, network impairments, and human visual system effects for a comprehensive quality evaluation. The proposed framework is able to measure the video quality compressed by different codecs, i.e. MPEG x / H.264x, Motion JPEG/Motion, and JPEG2000, etc. The framework is able to work with two different kinds of received data, i.e. bit streams and decoded pixels. The framework is an integration of the RR and NR method, and can work in three different modes depending on the availability of the RR data, i.e. 1) only RR measurement, 2) hybrid of RR and NR measurement, and 3) only NR estimation. In addition, any individual function block, i.e. blurring, can also be used independently for particular specific distortion. A new subjective video quality database containing compressed and distorted videos (due to channel induced distortions) is also developed to test the proposed framework. The framework has also been tested on publicly available LIVE Video Quality Database. Overall test results show that our framework demonstrates a strong correlation with subjective evaluation of the two separate video databases as compared with other existing algorithms. The proposed framework also shows good results while working only in NR mode as compared with existing RR and FR algorithms. The proposed framework is more scalable and feasible to use in any kind of available network bandwidth as compared with other algorithms, as it can be used in different modes by using different function modules.

1 citations


Cites methods from "No-Reference Video Quality Assessme..."

  • ...In another approach [39], an NR deep blind video quality assessment approach is used by considering various spatial and temporal cues obtained by using the deep CNN approach, and temporal cues features are obtained from spatial cues....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Journal ArticleDOI
TL;DR: This work has recently derived a blind IQA model that only makes use of measurable deviations from statistical regularities observed in natural images, without training on human-rated distorted images, and, indeed, without any exposure to distorted images.
Abstract: An important aim of research on the blind image quality assessment (IQA) problem is to devise perceptual models that can predict the quality of distorted images with as little prior knowledge of the images or their distortions as possible. Current state-of-the-art “general purpose” no reference (NR) IQA algorithms require knowledge about anticipated distortions in the form of training examples and corresponding human opinion scores. However we have recently derived a blind IQA model that only makes use of measurable deviations from statistical regularities observed in natural images, without training on human-rated distorted images, and, indeed without any exposure to distorted images. Thus, it is “completely blind.” The new IQA model, which we call the Natural Image Quality Evaluator (NIQE) is based on the construction of a “quality aware” collection of statistical features based on a simple and successful space domain natural scene statistic (NSS) model. These features are derived from a corpus of natural, undistorted images. Experimental results show that the new index delivers performance comparable to top performing NR IQA models that require training on large databases of human opinions of distorted images. A software release is available at http://live.ece.utexas.edu/research/quality/niqe_release.zip.

3,722 citations


"No-Reference Video Quality Assessme..." refers methods in this paper

  • ...Taking averages is often used to analyze representative characteristics of global quality in QA problems [8], and standard deviation was used to analyze the global variation of local quality [12]....

    [...]

Journal ArticleDOI
TL;DR: An efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics model of discrete cosine transform (DCT) coefficients, which requires minimal training and adopts a simple probabilistic model for score prediction.
Abstract: We develop an efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics (NSS) model of discrete cosine transform (DCT) coefficients. The algorithm is computationally appealing, given the availability of platforms optimized for DCT computation. The approach relies on a simple Bayesian inference model to predict image quality scores given certain extracted features. The features are based on an NSS model of the image DCT coefficients. The estimated parameters of the model are utilized to form features that are indicative of perceptual quality. These features are used in a simple Bayesian inference approach to predict quality scores. The resulting algorithm, which we name BLIINDS-II, requires minimal training and adopts a simple probabilistic model for score prediction. Given the extracted features from a test image, the quality score that maximizes the probability of the empirically determined inference model is chosen as the predicted quality score of that image. When tested on the LIVE IQA database, BLIINDS-II is shown to correlate highly with human judgments of quality, at a level that is competitive with the popular SSIM index.

1,484 citations


"No-Reference Video Quality Assessme..." refers result in this paper

  • ...Recent IQA studies [3]-[7] have resulted in higher performance improvements compared to earlier studies [8]....

    [...]

Journal ArticleDOI
TL;DR: The independent test results from the VQEG FR-TV Phase II tests are summarized, as well as results from eleven other subjective data sets that were used to develop the NTIA General Model.
Abstract: The National Telecommunications and Information Administration (NTIA) General Model for estimating video quality and its associated calibration techniques were independently evaluated by the Video Quality Experts Group (VQEG) in their Phase II Full Reference Television (FR-TV) test. The NTIA General Model was the only video quality estimator that was in the top performing group for both the 525-line and 625-line video tests. As a result, the American National Standards Institute (ANSI) adopted the NTIA General Model and its associated calibration techniques as a North American Standard in 2003. The International Telecommunication Union (ITU) has also included the NTIA General Model as a normative method in two Draft Recommendations. This paper presents a description of the NTIA General Model and its associated calibration techniques. The independent test results from the VQEG FR-TV Phase II tests are summarized, as well as results from eleven other subjective data sets that were used to develop the method.

1,268 citations

Journal ArticleDOI
TL;DR: In this paper, the reciprocal nature of these spatio-temporal interactions can be particularly clearly expressed if the threshold contrast is determined for a grating target whose luminance perpendicular to the bars is given by where m is the contrast, v the spatial frequency, and ƒ the temporal frequency of the target.
Abstract: T HE dependence of the form of the spatial contrast-sensitivity function for a square-wave test grating upon the duration of exposure of the target has been investigated by Schober and Hilz. 1 Kelly 2 has pointed out an analogous dependence of the form of the temporal contrast (modulation) sensitivity function upon the angular extent of the test target. The reciprocal nature of these spatio-temporal interactions can be particularly clearly ap­ preciated if the threshold contrast is determined for a grating target whose luminance perpendicular to the bars is given by where m is the contrast, v the spatial frequency, and ƒ the temporal frequency of the target. FIG. 2. Temporal contrast-sensitivity (reciprocal of threshold contrast) functions for different spatial frequencies. The points are the means of four measurements and the curves (two with dashed low-frequency sections) differ only in their positions along the contrast-sensitivity scale, O 0.5 cycle per degree, ● 4, ∆ 16, ▲ 22 cycles per degree. FIG. 1. Spatial contrast-sensitivity (reciprocal of threshold contrast) functions for different temporal frequencies. The points are the means of four measurements and the curves (one with a dashed low-frequency section) differ only in their positions along the contrast-sensitivity scale O 1 cycle per second, ● 6, ∆ 16, ▲ 22 cycles per second. Such a pattern was set up as a display on a cathode-ray oscil­ loscope and Figs. 1 and 2 show the results of threshold-contrast measurements made by the author (a well-corrected myope). Viewing was binocular at a distance of 2 m. The grating pattern subtended 2.5°×2.5° in the center of a 10°× 10° screen illuminated to the same mean luminance of 20 cd/m 2. The general similarity of the two sets of contrast-sensitivity functions is immediately evident but two features are particularly remarkable. First, the form of the fall-off in sensitivity at high

861 citations


"No-Reference Video Quality Assessme..." refers background in this paper

  • ...In conclusion, we use the temporal sharpness variation features because it reflects the human visual system characteristics of temporal variation CSF of frame sharpness....

    [...]

  • ...As the temporal variation of frame quality increases, the contrast of the time domain in the temporal CSF [13] increases, so the human recognizes the noise in the video....

    [...]