scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2019"


Journal ArticleDOI
TL;DR: The live video quality challenge database (LIVE-VQC) as mentioned in this paper is a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions.
Abstract: The great variations of videographic skills in videography, camera designs, compression and processing protocols, communication and bandwidth environments, and displays leads to an enormous variety of video impairments. Current no-reference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, complex, and often commingled distortions that are difficult or impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Toward advancing NR video quality prediction, we have constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding over 205 000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the live video quality challenge database (LIVE-VQC), by conducting a comparison with leading NR video quality predictors on it. This paper is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: http://live.ece.utexas.edu/research/LIVEVQC/index.html .

176 citations


Posted Content
TL;DR: This work introduces the largest (by far) subjective picture quality database, containing about 40, 000 real-world distorted pictures and 120, 000 patches, and built deep region-based architectures that learn to produce state-of-the-art global picture quality predictions as well as useful local picture quality maps.
Abstract: Blind or no-reference (NR) perceptual picture quality prediction is a difficult, unsolved problem of great consequence to the social and streaming media industries that impacts billions of viewers daily. Unfortunately, popular NR prediction models perform poorly on real-world distorted pictures. To advance progress on this problem, we introduce the largest (by far) subjective picture quality database, containing about 40000 real-world distorted pictures and 120000 patches, on which we collected about 4M human judgments of picture quality. Using these picture and patch quality labels, we built deep region-based architectures that learn to produce state-of-the-art global picture quality predictions as well as useful local picture quality maps. Our innovations include picture quality prediction architectures that produce global-to-local inferences as well as local-to-global inferences (via feedback).

94 citations


Journal ArticleDOI
TL;DR: In rigorous experiments, the proposed algorithms demonstrate the state-of-the-art performance on multiple video applications and are made available as a part of the open source package in https://github.com/Netflix/vmaf.
Abstract: The recently developed video multi-method assessment fusion (VMAF) framework integrates multiple quality-aware features to accurately predict the video quality. However, the VMAF does not yet exploit important principles of temporal perception that are relevant to the perceptual video distortion measurement. Here, we propose two improvements to the VMAF framework, called spatiotemporal VMAF and ensemble VMAF, which leverage perceptually-motivated space–time features that are efficiently calculated at multiple scales. We also conducted a large subjective video study, which we have found to be an excellent resource for training our feature-based approaches. In rigorous experiments, we found that the proposed algorithms demonstrate the state-of-the-art performance on multiple video applications. The compared algorithms will be made available as a part of the open source package in https://github.com/Netflix/vmaf .

90 citations


Journal ArticleDOI
TL;DR: A new mobile video quality database that contains videos afflicted with distortions caused by 26 different stalling patterns, and is making the database publicly available in order to help the advance state-of-the-art research on user-centric mobile network planning and management.
Abstract: Over-the-top mobile adaptive video streaming is invariably influenced by volatile network conditions, which can cause playback interruptions (stalling or rebuffering events) and bitrate fluctuations, thereby impairing users’ quality of experience (QoE). Video quality assessment models that can accurately predict users’ QoE under such volatile network conditions are rapidly gaining attention, since these methods could enable more efficient design of quality control protocols for media-driven services such as YouTube, Amazon, Netflix, and many others. However, the development of improved QoE prediction models requires data sets of videos afflicted with diverse stalling events that have been labeled with ground-truth subjective opinion scores. Toward this end, we have created a new mobile video quality database that we call LIVE Mobile Stall Video Database-II. Our database contains a total of 174 videos afflicted with distortions caused by 26 different stalling patterns. We describe the way we simulated the diverse stalling events to create a corpus of distorted videos, and we detail the human study we conducted to obtain continuous-time subjective scores from 54 subjects. We also present the outcomes of our comprehensive analysis of the impact of several factors that influence subjective QoE, and report the performance of existing QoE-prediction models on our data set. We are making the database (videos, subjective data, and video metadata) publicly available in order to help the advance state-of-the-art research on user-centric mobile network planning and management. The database may be accessed at http://live.ece.utexas.edu/research/LIVEStallStudy/liveMobile.html .

87 citations


Journal ArticleDOI
TL;DR: A novel two-step image quality prediction concept that combines NR with R quality measurements, which is versatile as it can use any desired R and NR components, and constructed a new, first-of-a-kind dedicated database specialized for the design and testing of two- step IQA models.
Abstract: In a typical communication pipeline, images undergo a series of processing steps that can cause visual distortions before being viewed. Given a high quality reference image, a reference (R) image quality assessment (IQA) algorithm can be applied after compression or transmission. However, the assumption of a high quality reference image is often not fulfilled in practice, thus contributing to less accurate quality predictions when using stand-alone R IQA models. This is particularly common on social media, where hundreds of billions of user-generated photos and videos containing diverse, mixed distortions are uploaded, compressed, and shared annually on sites like Facebook, YouTube, and Snapchat. The qualities of the pictures that are uploaded to these sites vary over a very wide range. While this is an extremely common situation, the problem of assessing the qualities of compressed images against their pre-compressed, but often severely distorted (reference) pictures has been little studied. Towards ameliorating this problem, we propose a novel two-step image quality prediction concept that combines NR with R quality measurements. Applying a first stage of NR IQA to determine the possibly degraded quality of the source image yields information that can be used to quality-modulate the R prediction to improve its accuracy. We devise a simple and efficient weighted product model of R and NR stages, which combines a pre-compression NR measurement with a post-compression R measurement. This first-of-a-kind two-step approach produces more reliable objective prediction scores. We also constructed a new, first-of-a-kind dedicated database specialized for the design and testing of two-step IQA models. Using this new resource, we show that two-step approaches yield outstanding performance when applied to compressed images whose original, pre-compression quality covers a wide range of realistic distortion types and severities. The two-step concept is versatile as it can use any desired R and NR components. We are making the source code of a particularly efficient model that we call 2stepQA publicly available at https://github.com/xiangxuyu/2stepQA . We are also providing the dedicated new two-step database free of charge at http://live.ece.utexas.edu/research/twostep/index.html .

45 citations


Journal ArticleDOI
TL;DR: The proposed algorithm, which is completely blind (requiring no reference videos or training on subjective scores) is called the Motion and Disparity-based 3D video quality evaluator (MoDi3D), which delivers competitive performance over a wide variety of datasets, including the IRCCYN dataset, the WaterlooIVC Phase I datasets, the LFOVIA dataset, and the proposed LFOVIAS3DPh2 S3D video dataset.
Abstract: We present a new subjective and objective study on full high-definition (HD) stereoscopic (3D or S3D) video quality. In the subjective study, we constructed an S3D video dataset with 12 pristine and 288 test videos, and the test videos are generated by applying the H.264 and H.265 compression, blur, and frame freeze artifacts. We also propose a no reference (NR) objective video quality assessment (QA) algorithm that relies on measurements of the statistical dependencies between the motion and disparity subband coefficients of S3D videos. Inspired by the Generalized Gaussian Distribution (GGD) approach, we model the joint statistical dependencies between the motion and disparity components as following a Bivariate Generalized Gaussian Distribution (BGGD). We estimate the BGGD model parameters ( $\alpha,\,\beta $ ) and the coherence measure ( $\Psi $ ) from the eigenvalues of the sample covariance matrix (M) of the BGGD. In turn, we model the BGGD parameters of pristine S3D videos using a Multivariate Gaussian (MVG) distribution. The likelihood of a test video’s MVG model parameters coming from the pristine MVG model is computed and shown to play a key role in the overall quality estimation. We also estimate the global motion content of each video by averaging the SSIM scores between pairs of successive video frames. To estimate the test S3D video’s spatial quality, we apply the popular 2D NR unsupervised NIQE image QA model on a frame-by-frame basis on both views. The overall quality of a test S3D video is finally computed by pooling the test S3D video’s likelihood estimates, global motion strength, and spatial quality scores. The proposed algorithm, which is completely blind (requiring no reference videos or training on subjective scores) is called the Motion and Disparity-based 3D video quality evaluator (MoDi3D). We show that MoDi3D delivers competitive performance over a wide variety of datasets, including the IRCCYN dataset, the WaterlooIVC Phase I dataset, the LFOVIA dataset, and our proposed LFOVIAS3DPh2 S3D video dataset.

30 citations


Journal ArticleDOI
TL;DR: A novel cloud detection algorithm for optical RS images, whereby test images are separated into three classes: thick clouds, thin clouds, and noncloudy, and a simple linear iterative clustering algorithm is adopted that is able to segment potential clouds, including small clouds.
Abstract: Cloud detection is an important task in remote sensing (RS) image processing. Numerous cloud detection algorithms have been developed. However, most existing methods suffer from the weakness of omitting small and thin clouds, and from an inability to discriminate clouds from photometrically similar regions, such as buildings and snow. Here, we derive a novel cloud detection algorithm for optical RS images, whereby test images are separated into three classes: thick clouds, thin clouds, and noncloudy. First, a simple linear iterative clustering algorithm is adopted that is able to segment potential clouds, including small clouds. Then, a natural scene statistics model is applied to the superpixels to distinguish between clouds and surface buildings. Finally, Gabor features are computed within each superpixel and a support vector machine is used to distinguish clouds from snow regions. The experimental results indicate that the proposed model outperforms state-of-the-art methods for cloud detection.

28 citations


Journal ArticleDOI
TL;DR: A new image quality assessment (IQA) measure is proposed that supports the visual qualitative analysis of pansharpened outcomes by using the statistics of natural images, commonly referred to as natural scene statistics (NSS), to extract statistical regularities from PS images.
Abstract: Pan-sharpening (PS) is a method of fusing the spatial details of a high-resolution panchromatic (PAN) image with the spectral information of a low-resolution multi-spectral (MS) image. Visual inspection is a crucial step in the evaluation of fused products whose subjectivity renders the assessment of pansharpened data a challenging problem. Most previous research on the development of PS algorithms has only superficially addressed the issue of qualitative evaluation, generally by depicting visual representations of the fused images. Hence, it is highly desirable to be able to predict pan-sharpened image quality automatically and accurately, as it would be perceived and reported by human viewers. Such a method is indispensable for the correct evaluation of PS techniques that produce images for visual applications such as Google Earth and Microsoft Bing. Here, we propose a new image quality assessment (IQA) measure that supports the visual qualitative analysis of pansharpened outcomes by using the statistics of natural images, commonly referred to as natural scene statistics (NSS), to extract statistical regularities from PS images. Importantly, NSS are measurably modified by the presence of distortions. We analyze six PS methods in the presence of two common distortions, blur and white noise, on PAN images. Furthermore, we conducted a human study on the subjective quality of pristine and degraded PS images and created a completely blind (opinion-unaware) fused image quality analyzer. In addition, we propose an opinion-aware fused image quality analyzer, whose predictions with respect to human perceptual evaluations of pansharpened images are highly correlated.

19 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: The spatiotemporal statistic of a wide variety of natural videos is studied, new directional temporal statistical models of videos are constructed, and whether measures of directional spatio-temporal naturalness can be developed that are predictive of quality are studied.
Abstract: Today, a wide variety of casual users of diverse videographic skills and styles capture a large portion of all videos, often equipped with uncertain hands and operating under difficult lighting conditions. These videos are taken with many types of camera devices having different characteristics, resulting in a wide range and diversity of video qualities. These are the kinds of videos shared on YouTube, Snapchat, and Face-book. Being able to predict the quality of these videos is an important goal for a variety of invested practitioners, including camera designers, cloud engineers, and users who could be directed to recapture videos of poor quality. In nearly every instance, a high quality reference video is not available, hence blind video quality predictors are of the greatest interest. Towards advancing this area, we have studied the spatiotemporal statistic of a wide variety of natural videos, constructed new directional temporal statistical models of videos, and studied whether measures of directional spatio-temporal naturalness can be developed that are predictive of quality.

16 citations


Journal ArticleDOI
TL;DR: This paper develops models that predict the effect of image quality on the detection of the improvised explosive device components by bomb technicians in images taken using portable X-ray systems and develops a new suite of statistical task prediction models that are believed to be the first NSS-based model for securityX-ray images.
Abstract: Developing methods to predict how image quality affects the task performance is a topic of great interest in many applications. While such studies have been performed in the medical imaging community, little work has been reported in the security X-ray imaging literature. In this paper, we develop models that predict the effect of image quality on the detection of the improvised explosive device components by bomb technicians in images taken using portable X-ray systems. Using a newly developed NIST-LIVE X-Ray Task Performance Database, we created a set of objective algorithms that predict bomb technician detection performance based on the measures of image quality. Our basic measures are traditional image quality indicators (IQIs) and perceptually relevant natural scene statistics (NSS)-based measures that have been extensively used in visible light image quality prediction algorithms. We show that these measures are able to quantify the perceptual severity of degradations and can predict the performance of expert bomb technicians in identifying threats. Combining NSS- and IQI-based measures yields even better task performance prediction than either of these methods independently. We also developed a new suite of statistical task prediction models that we refer to as quality inspectors of X-ray images (QUIX); we believe this is the first NSS-based model for security X-ray images. We also show that QUIX can be used to reliably predict conventional IQI metric values on the distorted X-ray images.

13 citations


Journal ArticleDOI
TL;DR: In this paper, the authors apply a promising method for the automatic extraction of channel presence called RivaMap, on both synthetic and experimental data sets, to investigate the changes experienced by the system in response to five changes in forcings.
Abstract: River deltas are complex, dynamic systems whose channel networks evolve in response to internal and external forcings. To capture these changes, methods to extract and analyze deltaic morphodynamics automatically using available remotely sensed imagery and experimental observations are needed. Here, we apply a promising method for the automatic extraction of channel presence called RivaMap, on both synthetic and experimental data sets, to investigate the changes experienced by the system in response to five changes in forcings. RivaMap is an automated method to extract nonbinarized channel locations from imagery based on a singularity index that combines the multiscale first and second derivatives of the image intensity to favor the identification of curvilinear features and suppress edges. We quantify how the channelization varies by computing the channelized response variance (CRV), which we define as the variance of each pixel's singularity index response through time. We find that increasing magnitudes of sediment inflow (Qs) and water inflow (Qw) result in corresponding increases in the maximum CRV. We find that increasing the ratio of Qs to Qw results in increased number of channelized areas. We see that adding cohesion to the exposed sediment surface of the experimental delta results in decreased magnitude and decreased number of channelized areas in the CRV. Finally, by observing changes to the CRV over time, we are able to quantify the timescale of internal channel reorganization events as the experimental delta evolves under constant forcings.

Proceedings ArticleDOI
11 May 2019
TL;DR: This paper presents an opinion-unaware BIQA measure of super resolved images based on optimally extracted perceptual features selected using a floating forward search whose objective function is the correlation with human judgment.
Abstract: The visual quality of images resulting from Super Resolution (SR) techniques is predicted with blind image quality assessment (BIQA) models trained on a database(s) of human rated distorted images and associated human subjective opinion scores. Such opinion-aware (OA) methods need a large amount of training samples with associated human subjective scores, which are scarce in the field of SR. By contrast, opinion distortion unaware (ODU) methods do not need human subjective scores for training. This paper presents an opinion-unaware BIQA measure of super resolved images based on optimally extracted perceptual features. This set of features was selected using a floating forward search whose objective function is the correlation with human judgment. The proposed BIQA method does not need any distorted images nor subjective quality scores for training, yet the experiments demonstrate its superior quality-prediction performance relative to state-of-the-art opinion-unaware BIQA methods, and that it is competitive to state-of-the-art opinion-aware BIQA methods.

Journal ArticleDOI
TL;DR: Eye-tracking experiments on humans viewing S3D images in SSID conclude that angular disparity features have a strong correlation with human judgments of discomfort.

Journal ArticleDOI
TL;DR: An LWIR image face recognition framework is proposed, based on thermal signature templates, and enhanced by natural scene statistics image quality descriptors, which achieves system robustness against image quality distortions and develops a novel infrared face recognition system that exhibits resistance to image distortions.
Abstract: Face identification systems that operate on long wave infrared (LWIR) images are able to overcome some of the limitations of approaches based on visible images, such as dealing effectively with ill...

Journal ArticleDOI
TL;DR: This work presents a novel approach to conducting no-reference artifact detection in digital videos, implemented as an efficient and unique dual-path (parallel) excitatory/inhibitory neural network that uses a simple discrimination rule to define a bank of accurate distortion detectors.
Abstract: Automatically identifying the locations and severities of video artifacts without the advantage of an original reference video is a difficult task. We present a novel approach to conducting no-reference artifact detection in digital videos, implemented as an efficient and unique dual-path (parallel) excitatory/inhibitory neural network that uses a simple discrimination rule to define a bank of accurate distortion detectors. The learning engine is distortion-sensitized by pre-processing each video using a statistical image model. The overall system is able to produce full-resolution space–time distortion maps for visualization, providing global distortion detection decisions that represent the state of the art in performance. Our model, which we call the video impairment mapper (VIDMAP), produces a first-of-a-kind full-resolution map of artifact detection probabilities. The current realization of this system is able to accurately detect and map eight of the most important artifact categories encountered during streaming video source inspection: aliasing, video encoding corruptions, quantization, contours/banding, combing, compression, dropped frames, and upscaling artifacts. We show that it is either competitive with or significantly outperforms the previous state of the art on the whole-image artifact detection task. A software release of VIDMAP that has been trained to detect and map these artifacts is available online: http://live.ece.utexas.edu/research/quality/VIDMAP_release.zip for public use and evaluation.

Book ChapterDOI
03 Sep 2019
TL;DR: The experiments on a real-world dataset consisting of 150 BF TEM images containing approximately 2,700 NPs show that the proposed method outperforms five current state-of-art approaches in the overlapping NPs segmentation.
Abstract: Transmission electron microscopy (TEM) provides information about Inorganic nanoparticles that no other method is able to deliver. Yet, a major task when studying Inorganic nanoparticles using TEM is the automated analysis of the images, i.e. segmentation of individual nanoparticles. The current state-of-the-art methods generally rely on binarization routines that require parameterization, and on methods to segment the overlapping nanoparticles (NPs) using highly idealized nanoparticle shape models. It is unclear, however, that there is any way to determine the best set of parameters providing an optimal segmentation, given the great diversity of NPs characteristics, such as shape and size, that may be encountered. Towards remedying these barriers, this paper introduces a method for segmentation of NPs in Bright Field (BF) TEM images. The proposed method involves three main steps: binarization, contour evidence extraction, and contour estimation. For the binarization, a model based on the U-Net architecture is trained to convert an input image into its binarized version. The contour evidence extraction starts by recovering contour segments from a binarized image using concave contour points detection. The contour segments which belong to the same nanoparticle are grouped in the segment grouping step. The grouping is formulated as a combinatorial optimization problem and solved using the well-known branch and bound algorithm. Finally, the full contours of the NPs are estimated by an ellipse. The experiments on a real-world dataset consisting of 150 BF TEM images containing approximately 2,700 NPs show that the proposed method outperforms five current state-of-art approaches in the overlapping NPs segmentation.

Journal ArticleDOI
TL;DR: This work proposes a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN).
Abstract: In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64$\times$64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.

Book ChapterDOI
07 Oct 2019
TL;DR: The high segmentation performance of the proposed method when evaluated on large scale quantitative breast MR images confirms its potential applicability in future breast cancer clinical applications.
Abstract: Pectoral muscle segmentation is a crucial step in various computer-aided applications of breast Magnetic Resonance Imaging (MRI). Due to imaging artifact and homogeneity between the pectoral and breast regions, the pectoral muscle boundary estimation is not a trivial task. In this paper, a fully automatic segmentation method based on deep learning is proposed for accurate delineation of the pectoral muscle boundary in axial breast MR images. The proposed method involves two main steps: pectoral muscle segmentation and boundary estimation. For pectoral muscle segmentation, a model based on the U-Net architecture is used to segment the pectoral muscle from the input image. Next, the pectoral muscle boundary is estimated through candidate points detection and contour segmentation. The proposed method was evaluated quantitatively with two real-world datasets, our own private dataset, and a publicly available dataset. The first dataset includes 12 patients breast MR images and the second dataset consists of 80 patients breast MR images. The proposed method achieved a Dice score of 95% in the first dataset and 89% in the second dataset. The high segmentation performance of the proposed method when evaluated on large scale quantitative breast MR images confirms its potential applicability in future breast cancer clinical applications.

Journal ArticleDOI
TL;DR: ProxIQA as discussed by the authors is a proxy network that mimics the perceptual model while serving as a loss layer of the network, which can be applied to train an end-to-end optimized image compression network.
Abstract: The use of $\ell_p$ $(p=1,2)$ norms has largely dominated the measurement of loss in neural networks due to their simplicity and analytical properties. However, when used to assess the loss of visual information, these simple norms are not very consistent with human perception. Here, we describe a different "proximal" approach to optimize image analysis networks against quantitative perceptual models. Specifically, we construct a proxy network, broadly termed ProxIQA, which mimics the perceptual model while serving as a loss layer of the network. We experimentally demonstrate how this optimization framework can be applied to train an end-to-end optimized image compression network. By building on top of an existing deep image compression model, we are able to demonstrate a bitrate reduction of as much as $31\%$ over MSE optimization, given a specified perceptual quality (VMAF) level.


Journal ArticleDOI
TL;DR: The experimental results show that the objective quality scores obtained can be reliably used to eliminate well logs of inferior quality from the processing pipeline, which can serve as a beneficial step to reduce the human hours spent in examining well logs and to improve the rate of information retrieval as well as the accuracy of retrieved information.
Abstract: Assessing the image quality of well logs is essential to ensure the accuracy of their digitization and subsequent processing. Currently, the suitability of well logs for information retrieval is solely determined on the basis of subjective judgments of their image quality by human experts. The success of natural scene statistics (NSS)-based models that are used to conduct no-reference (NR) quality assessment of photographic images motivates us to try to exploit them to characterize the quality of nonphotographic images, such as well logs. Accordingly, we develop a scheme to characterize the quality of a well log as “acceptable” or “unacceptable” for subsequent processing based on the natural image quality evaluator (NIQE), a successful NR image quality assessment model based on the NSS. Our experimental results show that the objective quality scores thus obtained can be reliably used to eliminate well logs of inferior quality from the processing pipeline, which can serve as a beneficial step to reduce the human hours spent in examining well logs and to improve the rate of information retrieval as well as the accuracy of retrieved information. Source code for the trained well log image quality predictor is available at https://github.com/Somdyuti2/Well_log_IQA .