scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2015"


Journal ArticleDOI
TL;DR: The proposed opinion-unaware BIQA method does not need any distorted sample images nor subjective quality scores for training, yet extensive experiments demonstrate its superior quality-prediction performance to the state-of-the-art opinion-aware BIZA methods.
Abstract: Existing blind image quality assessment (BIQA) methods are mostly opinion-aware. They learn regression models from training images with associated human subjective scores to predict the perceptual quality of test images. Such opinion-aware methods, however, require a large amount of training samples with associated human subjective scores and of a variety of distortion types. The BIQA models learned by opinion-aware methods often have weak generalization capability, hereby limiting their usability in practice. By comparison, opinion-unaware methods do not need human subjective scores for training, and thus have greater potential for good generalization capability. Unfortunately, thus far no opinion-unaware BIQA method has shown consistently better quality prediction accuracy than the opinion-aware methods. Here, we aim to develop an opinion-unaware BIQA method that can compete with, and perhaps outperform, the existing opinion-aware methods. By integrating the features of natural image statistics derived from multiple cues, we learn a multivariate Gaussian model of image patches from a collection of pristine natural images. Using the learned multivariate Gaussian model, a Bhattacharyya-like distance is used to measure the quality of each image patch, and then an overall quality score is obtained by average pooling. The proposed BIQA method does not need any distorted sample images nor subjective quality scores for training, yet extensive experiments demonstrate its superior quality-prediction performance to the state-of-the-art opinion-aware BIQA methods. The MATLAB source code of our algorithm is publicly available at www.comp.polyu.edu.hk / $\sim $ cslzhang/IQA/ILNIQE/ILNIQE.htm.

783 citations


Journal ArticleDOI
TL;DR: The proposed model, called Fog Aware Density Evaluator (FADE), predicts the visibility of a foggy scene from a single image without reference to a corresponding fog-free image, without dependence on salient objects in a scene, without side geographical camera information, and without estimating a depth-dependent transmission map.
Abstract: We propose a referenceless perceptual fog density prediction model based on natural scene statistics (NSS) and fog aware statistical features. The proposed model, called Fog Aware Density Evaluator (FADE), predicts the visibility of a foggy scene from a single image without reference to a corresponding fog-free image, without dependence on salient objects in a scene, without side geographical camera information, without estimating a depth-dependent transmission map, and without training on human-rated judgments. FADE only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. Fog aware statistical features that define the perceptual fog density index derive from a space domain NSS model and the observed characteristics of foggy images. FADE not only predicts perceptual fog density for the entire image, but also provides a local fog density index for each patch. The predicted fog density using FADE correlates well with human judgments of fog density taken in a subjective study on a large foggy image database. As applications, FADE not only accurately assesses the performance of defogging algorithms designed to enhance the visibility of foggy images, but also is well suited for image defogging. A new FADE-based referenceless perceptual image defogging, dubbed DEnsity of Fog Assessment-based DEfogger (DEFADE) achieves better results for darker, denser foggy images as well as on standard foggy images than the state of the art defogging methods. A software release of FADE and DEFADE is available online for public use: http://live.ece.utexas.edu/research/fog/index.html .

510 citations


Journal ArticleDOI
TL;DR: The LIVE In the Wild Image Quality Challenge Database as discussed by the authors contains widely diverse authentic image distortions on a large number of images captured using a representative variety of modern mobile devices and has been used to conduct a very large-scale, multi-month image quality assessment subjective study.
Abstract: Most publicly available image quality databases have been created under highly controlled conditions by introducing graded simulated distortions onto high-quality photographs. However, images captured using typical real-world mobile camera devices are usually afflicted by complex mixtures of multiple distortions, which are not necessarily well-modeled by the synthetic distortions found in existing databases. The originators of existing legacy databases usually conducted human psychometric studies to obtain statistically meaningful sets of human opinion scores on images in a stringently controlled visual environment, resulting in small data collections relative to other kinds of image analysis databases. Towards overcoming these limitations, we designed and created a new database that we call the LIVE In the Wild Image Quality Challenge Database, which contains widely diverse authentic image distortions on a large number of images captured using a representative variety of modern mobile devices. We also designed and implemented a new online crowdsourcing system, which we have used to conduct a very large-scale, multi-month image quality assessment subjective study. Our database consists of over 350000 opinion scores on 1162 images evaluated by over 7000 unique human observers. Despite the lack of control over the experimental environments of the numerous study participants, we demonstrate excellent internal consistency of the subjective dataset. We also evaluate several top-performing blind Image Quality Assessment algorithms on it and present insights on how mixtures of distortions challenge both end users as well as automatic perceptual quality prediction models.

207 citations


Journal ArticleDOI
TL;DR: A new no-reference stereoscopic/3D IQA framework is developed, dubbed stereoscopic-3D blind image naturalness quality index, which utilizes both univariate and generalized bivariate natural scene statistics (NSS) models.
Abstract: In recent years, bandpass statistical models of natural, photographic images of the world have been used with great success to solve highly diverse problems involving image representation, image repair, image quality assessment (IQA), and image compression. One missing element has been a reliable and generic model of spatial image correlation that reflects the distributions of oriented and relatively oriented spatial structures. We have developed such a model for bandpass pristine images and have generalized it here to also capture the spatial correlation structure of bandpass distorted images. The model applies well to both luminance and depth images. As a demonstration of the usefulness of the generalized model, we develop a new no-reference stereoscopic/3D IQA framework, dubbed stereoscopic/3D blind image naturalness quality index, which utilizes both univariate and generalized bivariate natural scene statistics (NSS) models. We first validate the robustness and effectiveness of these novel bivariate and correlation NSS features extracted from distorted stereopairs, then demonstrate that they are predictive of distortion severity. Our experimental results show that the resulting 3D image quality predictor based in part on the new model outperforms state-of-the-art full- and no-reference 3D IQA algorithms on both symmetrically and asymmetrically distorted stereoscopic image pairs.

86 citations


Journal ArticleDOI
TL;DR: In this article, a model-based neuronal and statistical framework called the 3D visual discomfort predictor (3D-VDP) was developed to automatically predict the level of visual discomfort that is experienced when viewing S3D images.
Abstract: Being able to predict the degree of visual discomfort that is felt when viewing stereoscopic 3D (S3D) images is an important goal toward ameliorating causative factors, such as excessive horizontal disparity, misalignments or mismatches between the left and right views of stereo pairs, or conflicts between different depth cues. Ideally, such a model should account for such factors as capture and viewing geometries, the distribution of disparities, and the responses of visual neurons. When viewing modern 3D displays, visual discomfort is caused primarily by changes in binocular vergence while accommodation in held fixed at the viewing distance to a flat 3D screen. This results in unnatural mismatches between ocular fixations and ocular focus that does not occur in normal direct 3D viewing. This accommodation vergence conflict can cause adverse effects, such as headaches, fatigue, eye strain, and reduced visual ability. Binocular vision is ultimately realized by means of neural mechanisms that subserve the sensorimotor control of eye movements. Realizing that the neuronal responses are directly implicated in both the control and experience of 3D perception, we have developed a model-based neuronal and statistical framework called the 3D visual discomfort predictor (3D-VDP) that automatically predicts the level of visual discomfort that is experienced when viewing S3D images. 3D-VDP extracts two types of features: 1) coarse features derived from the statistics of binocular disparities and 2) fine features derived by estimating the neural activity associated with the processing of horizontal disparities. In particular, we deploy a model of horizontal disparity processing in the extrastriate middle temporal region of occipital lobe. We compare the performance of 3D-VDP with other recent discomfort prediction algorithms with respect to correlation against recorded subjective visual discomfort scores, and show that 3D-VDP is statistically superior to the other methods.

81 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a rate adaptation algorithm that can incorporate QoE constraints on the empirical cumulative distribution function (eCDF) of the predicted video quality, which can reduce the risk of playback interruptions caused by channel throughput fluctuations.
Abstract: Adapting video data rate during streaming can effectively reduce the risk of playback interruptions caused by channel throughput fluctuations. The variations in rate, however, also introduce video quality fluctuations and thus potentially affects viewers' Quality of Experience (QoE). We show how the QoE of video users can be improved by rate adaptation and admission control. We conducted a subjective study wherein we found that viewers' QoE was strongly correlated with the empirical cumulative distribution function (eCDF) of the predicted video quality. Based on this observation, we propose a rate-adaptation algorithm that can incorporate QoE constraints on the empirical cumulative quality distribution per user. We then propose a threshold-based admission control policy to block users whose empirical cumulative quality distribution is not likely to satisfy their QoE constraint. We further devise an online adaptation algorithm to automatically optimize the threshold. Extensive simulation results show that the proposed scheme can reduce network resource consumption by 40% over conventional average-quality maximized rate-adaptation algorithms.

59 citations


Proceedings ArticleDOI
TL;DR: The proposed model accounts for and adapts to the recency, or hysteresis effect caused by a stall event in addition to accounting for the lengths, frequency of occurrence, and the positions of stall events - factors that interact in a complex way to affect a user's QoE.
Abstract: Over-the-top mobile video streaming is invariably influenced by volatile network conditions which cause playback interruptions (stalling events), thereby impairing users' quality of experience (QoE). Developing models that can accurately predict users' QoE could enable the more efficient design of quality-control protocols for video streaming networks that reduce network operational costs while still delivering high-quality video content to the customers. Existing objective models that predict QoE are based on global video features, such as the number of stall events and their lengths, and are trained and validated on a small pool of ad hoc video datasets, most of which are not publicly available. The model we propose in this work goes beyond previous models as it also accounts for the fundamental effect that a viewer's recent level of satisfaction or dissatisfaction has on their overall viewing experience. In other words, the proposed model accounts for and adapts to the recency, or hysteresis effect caused by a stall event in addition to accounting for the lengths, frequency of occurrence, and the positions of stall events - factors that interact in a complex way to affect a user's QoE. On the recently introduced LIVE-Avvasi Mobile Video Database, which consists of 180 distorted videos of varied content that are afflicted solely with over 25 unique realistic stalling events, we trained and validated our model to accurately predict the QoE, attaining standout QoE prediction performance.

36 citations


Journal ArticleDOI
TL;DR: A fully automated method that extracts channels from remotely sensed images and estimates their widths based on a recently proposed multiscale singularity index that strongly responds to curvilinear structures but weakly responds to edges is proposed.
Abstract: The quantitative analysis of channel networks plays an important role in river studies. To provide a quantitative representation of channel networks, we propose a new method that extracts channels from remotely sensed images and estimates their widths. Our fully automated method is based on a recently proposed multiscale singularity index that strongly responds to curvilinear structures but weakly responds to edges. The algorithm produces a channel map using a single image where water and nonwater pixels have contrast, such as a Landsat near-infrared band image or a water index defined on multiple bands. The proposed method provides a robust alternative to the procedures that are used in the remote sensing of fluvial geomorphology and makes the classification and analysis of channel networks easier. The source code of the algorithm is available at http://live.ece.utexas.edu/research/cne/ .

35 citations


Proceedings ArticleDOI
TL;DR: A deep belief network is designed that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction.
Abstract: Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

33 citations


Journal ArticleDOI
TL;DR: The experimental results demonstrate that the TVDM transfer function model produces predictions that correlate highly with the subjective visual discomfort scores contained in the large public databases, and yields insights into the perceptual processes that yield a stable S3D image.
Abstract: When viewing 3D images, a sense of visual comfort (or lack of) is developed in the brain over time as a function of binocular disparity and other 3D factors. We have developed a unique temporal visual discomfort model (TVDM) that we use to automatically predict the degree of discomfort felt when viewing stereoscopic 3D (S3D) images. This model is based on physiological mechanisms. In particular, TVDM is defined as a second-order system capturing relevant neuronal elements of the visual pathway from the eyes and through the brain. The experimental results demonstrate that the TVDM transfer function model produces predictions that correlate highly with the subjective visual discomfort scores contained in the large public databases. The transfer function analysis also yields insights into the perceptual processes that yield a stable S3D image.

26 citations


Book ChapterDOI
01 Jan 2015
TL;DR: This chapter discusses the challenges and difficulties one may face while trying to design and develop an effective objective quality assessment (QA) algorithm for stereoscopic images, and examines and analyzes stereoscopic QA algorithms, focusing mainly on advances in exploiting natural scene statistics (NSS) and human visual system models in the design of stereoscopicQA algorithms.
Abstract: Visual quality assessment of stereoscopic/3D images and videos has become an increasingly important and active field of research with the rapid growth in the quantity of stereoscopic/3D content created by the cinema, television, and entertainment industries. However, due to the diversity of stereoscopic/3D display technology and the complexity of human 3D perception, understanding the quality of experience (QoE) of stereoscopic/3D image and video is a difficult and multidisciplinary problem. Objective visual quality assessment attempts to quantify this subjective perception of visual QoE, utilizing tools from engineering, visual science, and psychology. In this chapter, first we discuss the challenges and difficulties one may face while trying to design and develop an effective objective quality assessment (QA) algorithm for stereoscopic images. This discussion is limited to “quality” where the stimulus being perceived is affected by some kind of distortions. In contrast to the success of a variety of objective QA algorithms for 2D images and videos, the field of stereoscopic image and video QA has been less successful in finding widely adopted quality measures. Most objective stereoscopic QA algorithms can be regarded as extensions of 2D QA algorithms, while few of them consider some aspects of depth perception and utilize either computed or measured depth/disparity information from the stereo pairs. We examine and analyze these stereoscopic QA algorithms, while focusing mainly on advances in exploiting natural scene statistics (NSS) and human visual system models in the design of stereoscopic QA algorithms. We also discuss recent work conducted on evaluating visual discomfort and fatigue when viewing stereoscopic images and videos—the more comprehensive “quality-of-experience” evaluation. Finally, we conclude the chapter with a discussion of possible future directions that the field of stereoscopic image and video QA may take. Our summary focuses on gaining a better understanding of depth/disparity sensation, using accurate and robust statistical models of natural stereo pairs, and performing a thorough analysis of various factors affecting the perception of stereoscopic distortions.

Journal ArticleDOI
TL;DR: These models can be used to develop 3D content creation algorithms that can convert monocular 2D videos into statistically natural 3D-viewable videos and outperforms several state-of-the-art 2D-to-3D conversion methods.
Abstract: Natural scene statistics (NSSs) models have been developed that make it possible to impose useful perceptually relevant priors on the luminance, colors, and depth maps of natural scenes. We show that these models can be used to develop 3D content creation algorithms that can convert monocular 2D videos into statistically natural 3D-viewable videos. First, accurate depth information on key frames is obtained via human annotation. Then, both forward and backward motion vectors are estimated and compared to decide the initial depth values, and a compensation process is applied to further improve the depth initialization. Then, the luminance/chrominance and initial depth map are decomposed by a Gabor filter bank. Each subband of depth is modeled to produce a NSS prior term. The statistical color-depth priors are combined with the spatial smoothness constraint in the depth propagation target function as a prior regularizing term. The final depth map associated with each frame of the input 2D video is optimized by minimizing the target function over all subbands. In the end, stereoscopic frames are rendered from the color frames and their associated depth maps. We evaluated the quality of the generated 3D videos using both subjective and objective quality assessment methods. The experimental results obtained on various sequences show that the presented method outperforms several state-of-the-art 2D-to-3D conversion methods.

Journal ArticleDOI
TL;DR: It is believed that sufficiently fast and coherent motion silences the perception of flicker distortions on naturalistic videos in agreement with a recently observed "motion silencing" effect on synthetic stimuli.
Abstract: We study the influence of motion on the visibility of flicker distortions in naturalistic videos. A series of human subjective studies were executed to understand how motion silences the visibility of flicker distortions as a function of object motion, flicker frequency, and video quality. We found that flicker visibility is strongly reduced when the speed of coherent motion is large, and the effect is pronounced when video quality is poor. Based on this finding, we propose a model of flicker visibility on naturalistic videos. The target-related activation levels in the excitatory layer of neurons were estimated for a displayed video using a spatiotemporal backward masking model, and then the flicker visibility is predicted based on a learned model of neural flicker adaptation processes. Experimental results show that the prediction of flicker visibility using the proposed model correlates well with human perception of flicker distortions. We believe that sufficiently fast and coherent motion silences the perception of flicker distortions on naturalistic videos in agreement with a recently observed "motion silencing" effect on synthetic stimuli. We envision that the proposed model could be applied to develop perceptual video quality assessment algorithms that can predict "silenced" temporal distortions and account for them when computing quality judgments. We study motion silencing of flicker distortions on naturalistic videos.Flicker visibility is strongly reduced when the speed of object motion is large.Motion silencing of flicker distortions is pronounced when video quality is poor.We propose a model of flicker visibility on naturalistic videos.The model prediction of flicker visibility correlates well with human perception.

Journal ArticleDOI
TL;DR: This work proposes a new closed-form spatial-oriented correlation model that captures statistical regularities between perceptually decomposed natural image luminance samples and validate the new correlation model on a variety of natural images.
Abstract: Most prevalent statistical models of natural images characterize only the univariate distributions of divisively normalized bandpass responses or wavelet-like decompositions of them. However, the higher-order dependencies between spatially neighboring responses are not yet well understood. Towards filling this gap, we propose a new closed-form spatial-oriented correlation model that captures statistical regularities between perceptually decomposed natural image luminance samples. We validate the new correlation model on a variety of natural images. Experimental results demonstrate the robustness of the new correlation model across image content. A software release that implements the new closed-form spatial-oriented correlation model is available at http://live.ece.utexas.edu/research/3dnss/bicorr_release.zip.

Journal ArticleDOI
TL;DR: A cross-layer optimization-based scheduling scheme called binding optimization of duty cycling and networking through energy tracking (BUCKET) is developed, which is formulated in four-stages and displays performance enhancements of ~12-15% over those of conventional methods in terms of the average service rate.
Abstract: Renewable solar energy harvesting systems have received considerable attention as a possible substitute for conventional chemical batteries in sensor networks. However, it is difficult to optimize the use of solar energy based only on empirical power acquisition patterns in sensor networks. We apply acquisition patterns from actual solar energy harvesting systems and build a framework to maximize the utilization of solar energy in general sensor networks. To achieve this goal, we develop a cross-layer optimization-based scheduling scheme called binding optimization of duty cycling and networking through energy tracking (BUCKET), which is formulated in four-stages: 1) prediction of energy harvesting and arriving traffic; 2) internode optimization at the transport and network layers; 3) intranode optimization at the medium access control layer; and 4) flow control of generated communication task sets using a token-bucket algorithm. Monitoring of the structural health of bridges is shown to be a potential application of an energy-harvesting sensor network. The example network deploys five sensor types: 1) temperature; 2) strain gauge; 3) accelerometer; 4) pressure; and 5) humidity. In the simulations, the BUCKET algorithm displays performance enhancements of ~12-15% over those of conventional methods in terms of the average service rate.

Journal ArticleDOI
TL;DR: Developing models that lead to motion features that are extracted from videos and used in a new video saliency detection method called spatial-temporal weighted dissimilarity (STWD), which is highly competitive with, and delivers better performance than state-of-the-art methods.
Abstract: Accurately modeling and predicting the visual attention behavior of human viewers can help a video analysis algorithm find interesting regions by reducing the search effort of tasks, such as object detection and recognition. In recent years, a great number and variety of visual attention models for predicting the direction of gaze on images and videos have been proposed. When a human views video, the motions of both objects in the video and of the camera greatly affect the distribution of visual fixations. Here we develop models that lead to motion features that are extracted from videos and used in a new video saliency detection method called spatial-temporal weighted dissimilarity (STWD). To achieve efficiency, frames are partitioned into blocks on which saliency calculations are made. Two spatial features are defined on each block, termed spatial dissimilarity and preference difference, which are used to characterize the spatial conspicuity of each block. The motion features extracted from each block are simple differences of motion vectors between adjacent frames. Finally, the spatial and motion features are used to generate a saliency map on each frame. Experiments on three public video datasets containing 185 video clips and corresponding eye traces revealed that the proposed saliency detection method is highly competitive with, and delivers better performance than state-of-the-art methods. HighlightsWe proposed a new video saliency detection method to allow for motion conspicuity.Spatial dissimilarity and preference difference are used as spatial conspicuity.The motion features are differences of motion vectors between adjacent frames.Experiments on ORIG-CRCNS, MTV and DIEM datasets.

Proceedings ArticleDOI
09 Dec 2015
TL;DR: This work proposes a novel IQA model that focuses on the natural scene statistics of images afflicted with complex mixtures of unknown, authentic distortions, and derives several feature maps in different perceptually relevant color spaces and extracts a large number of image features from them.
Abstract: Current top-performing blind image quality assessment (IQA) models rely on benchmark databases comprising of singly distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. Furthermore, the underlying image features of these models are often extracted from the achromatic luminance channel and could sometimes fail to account for the loss of their perceived quality that might potentially be distinctly captured in a different image modality. In this work, we propose a novel IQA model that focuses on the natural scene statistics of images afflicted with complex mixtures of unknown, authentic distortions. We derive several feature maps in different perceptually relevant color spaces and extract a large number of image features from them. We demonstrate the remarkable competence of our features in improving the automatic perceptual quality prediction on images containing both synthetic and authentic distortions.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: This model extends Su et al.'s closed-form correlation model to non-adjacent distant bandpass image responses over multiple spatial orientations and scales and considers the effects of spatial distance between the bandpass samples.
Abstract: Building natural scene statistic models is a potentially transformative development for a wide variety of visual applications, ranging from the design of faithful image and video quality models to the development of perceptually optimized image enhancing techniques. Most predominant statistical models of natural images only characterize the univariate distributions of divisively normalized bandpass image responses. Previous efforts towards modeling bandpass natural responses have not focused on finding closed-form quantative models of bivariate natural statistics. Towards filling this gap, Su et al. [1] recently modeled spatially adjacent bandpass image responses over multiple scales; however, they did not consider the effects of spatial distance between the bandpass samples. Here we build on Su et al.'s model and extend their closed-form correlation model to non-adjacent distant bandpass image responses over multiple spatial orientations and scales.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: Using this model, a new denoising algorithm is created, which is called the Gaussian scale mixture perceptual pattern denoiser, which can fully characterize non-uniformity using covariance matrices.
Abstract: Infrared images are commonly afflicted by distortions such as non-uniformity. Non-uniformity is characterized by horizontal and vertical fixed pattern noise. Accurately estimating the amount of non-uniformity present in an image and removing that amount of non-uniformity noise are open problems. Several estimators of non-uniformity exist, but their ability to estimate degrades with the presence of other sources of noise. Specifically, most of these metrics lack the robustness demanded by a more complete non-uniformity model. Previous non-uniformity correction algorithms are compared and found to underperform relative to a more complete model of non-uniformity that we have developed. Using this model, we have created a new denoising algorithm, which we call the Gaussian scale mixture perceptual pattern denoiser. The new model and algorithm can fully characterize non-uniformity using covariance matrices.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: Human study results and statistical analysis show that highly eccentric, coherent object motion can significantly silence the awareness of flicker distortions on naturalistic videos.
Abstract: We study the effect of eccentricity on flicker visibility in naturalistic videos. A series of human studies were executed in two tasks ("gaze the fixation mark" and "follow the moving object") to understand how object motion can reduce the visibility of flicker distortions as a function of eccentricity, motion speed, video quality, and flicker frequency. We found that either large eccentricity or large, coherent object motion could reduce flicker visibility. When they are combined, flicker visibility significantly decreased. Flicker visibility remained noticeable even at large eccentricity when the object was static. Human study results and statistical analysis show that highly eccentric, coherent object motion can significantly silence the awareness of flicker distortions on naturalistic videos.

Journal ArticleDOI
TL;DR: This work has designed a unique and challenging image data set with associated human opinion scores called the Laboratory for Image and Video Engineering (LIVE) authentic image quality challenge database, and is developing a robust blind IQA model, which outperforms other state-of-the-art blindIQA algorithms on both the LIVE legacy IQA9 and the LIVE challenge database8 (see Table 1).
Abstract: Social media and rapid advances in camera and mobile device technology have led to the creation and consumption of a seemingly limitless supply of visual content. However, the vast majority of these digital images are captured by casual amateur photographers whose unsure hands and eyes often introduce annoying artifacts during acquisition. In addition, subsequent storage and transmission of visual media can further degrade their visual quality. Recent developments in visual modeling have elucidated the impact of visual distortions on perception of such pictures and videos. They have laid the foundation for automatic and accurate metrics that can identify and predict the quality of visual media as perceived by human observers.1 To address this problem, several objective blind or no-reference (NR) image quality assessment (IQA) algorithms have been developed to predict the perceptual quality of a given (possibly distorted) image without additional information.2–7 Such quality metrics could be used to monitor and control multimedia services on networks and devices or to prioritize quality of transmission over speed, for example. Real-world images are usually afflicted by mixtures of distortions that differ significantly from the single, unmixed distortions contained in restrictive and unrepresentative legacy databases.9–12 We recently designed a unique and challenging image data set with associated human opinion scores called the Laboratory for Image and Video Engineering (LIVE) authentic image quality challenge database8 (see Figure 1). Using this LIVE challenge database, we have been developing a robust blind IQA model for images suffering from real-world, authentic distortions. We call our model the ‘feature maps driven referenceless image quality evaluation engine’ (FRIQUEE) index. FRIQUEE outperforms other state-of-the-art blind IQA algorithms on both the LIVE legacy IQA9 and the LIVE challenge database8 (see Table 1). Figure 1. Sample images from the Laboratory for Image and Video Engineering (LIVE) authentic image quality challenge database.8 This collection comprises 1163 images afflicted with complex mixtures of unknown distortions, of different types and severities, from diverse camera devices, and under varied illumination conditions. The content includes pictures of faces, people, animals, close-up shots, wideangle shots, nature scenes, man-made objects, images with distinct foreground/background configurations, and images without any notable object of interest.

Journal ArticleDOI
TL;DR: A new stereo model is formulated that minimizes a global energy functional to densely estimate disparity on stereo mammogram images, by introducing a new singularity index as a constraint to obtain better estimates of disparity along critical curvilinear structures.
Abstract: We consider the problem of depth estimation on digital stereo mammograms. Being able to elucidate 3D information from stereo mammograms is an important precursor to conducting 3D digital analysis of data from this promising new modality. The problem is generally much harder than the classic stereo matching problem on visible light images of the natural world, since nearly all of the 3D structural information of interest exists as complex network of multilayered, heavily occluded curvilinear structures. Toward addressing this difficult problem, we formulate a new stereo model that minimizes a global energy functional to densely estimate disparity on stereo mammogram images, by introducing a new singularity index as a constraint to obtain better estimates of disparity along critical curvilinear structures. Curvilinear structures, such as vasculature and spicules, are particularly salient structures in the breast, and being able to accurately position them in 3D is a valuable goal. Experiments on synthetic images with known ground truth and on real stereo mammograms highlight the advantages of the proposed stereo model over the canonical stereo model.

Journal ArticleDOI
TL;DR: A new strategy is described that allows simulation of surgically plausible facial disfigurement on a novel face for elucidating the human perception on facial disfigured faces and its simulation represents plausible outcomes of reconstructive surgery for facial cancers.
Abstract: Patients with facial cancers can experience disfigurement as they may undergo considerable appearance changes from their illness and its treatment. Individuals with difficulties adjusting to facial cancer are concerned about how others perceive and evaluate their appearance. Therefore, it is important to understand how humans perceive disfigured faces. We describe a new strategy that allows simulation of surgically plausible facial disfigurement on a novel face for elucidating the human perception on facial disfigurement. Longitudinal 3D facial images of patients (N = 17) with facial disfigurement due to cancer treatment were replicated using a facial mannequin model, by applying Thin-Plate Spline (TPS) warping and linear interpolation on the facial mannequin model in polar coordinates. Principal Component Analysis (PCA) was used to capture longitudinal structural and textural variations found within each patient with facial disfigurement arising from the treatment. We treated such variations as disfigurement. Each disfigurement was smoothly stitched on a healthy face by seeking a Poisson solution to guided interpolation using the gradient of the learned disfigurement as the guidance field vector. The modeling technique was quantitatively evaluated. In addition, panel ratings of experienced medical professionals on the plausibility of simulation were used to evaluate the proposed disfigurement model. The algorithm reproduced the given face effectively using a facial mannequin model with less than 4.4 mm maximum error for the validation fiducial points that were not used for the processing. Panel ratings of experienced medical professionals on the plausibility of simulation showed that the disfigurement model (especially for peripheral disfigurement) yielded predictions comparable to the real disfigurements. The modeling technique of this study is able to capture facial disfigurements and its simulation represents plausible outcomes of reconstructive surgery for facial cancers. Thus, our technique can be used to study human perception on facial disfigurement.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: A neuronal model-based framework called Neural 3D Visual Discomfort Predictor (N3D-VDP) is proposed that automatically predicts the level of visual discomfort experienced when viewing stereoscopic 3D (S3D) images and is statistically superior to the other methods.
Abstract: Visual discomfort assessment (VDA) on stereoscopic images is of fundamental importance for making decisions regarding visual fatigue caused by unnatural binocular alignment. Nevertheless, no solid framework exists to quantify this discomfort using models of the responses of visual neurons. Binocular vision is realized by means of neural mechanisms that subserve the sensorimotor control of eye movements. We propose a neuronal model-based framework called Neural 3D Visual Discomfort Predictor (N3D-VDP) that automatically predicts the level of visual discomfort experienced when viewing stereoscopic 3D (S3D) images. The N3D-VDP model extracts features derived by estimating the neural activity associated with the processing of binocular disparities. In this regard we deploy a model of disparity processing in the extra-striate middle temporal (MT) region of occipital lobe. We compare the performance of N3D-VDP with other recent VDA algorithms using correlations against reported subjective visual discomfort, and show that N3D-VDP is statistically superior to the other methods.

01 Jan 2015
TL;DR: It is found that this particular I/VQA model is not apt for evaluating collections with varied content, and that their implementation at large scale can narrow the problem of curating very digital video collections and lead to preservation and access decisions based on informed priorities.
Abstract: As the production, the variety, and the consumption of borndigital video grows, so does the demand for acquiring, curating and preserving large-scale digital video collections. A multidisciplinary team of curators, computer scientists and video engineers we explore the use of Non-Reference Image and Video Quality Algorithms (I/VQA), specifically of BRISQUE in this paper, to automatically derive ranges of video quality. An important characteristic of these algorithms is that they are modeled to human perception. We run the algorithms in a High Performance Computing (HPC) environment to obtain results for many videos at the same time, accelerating time to results and precision in computing per-frame and per-video quality assessment scores. Results, which were evaluated quantitatively and qualitatively, suggest that BRISQUE identifies the distortions in which it was trained, and performs well in videos that have natural scenes and do not have drastic scene changes. While we found that this particular model is not apt for evaluating collections with varied content, the results suggest that research into other I/VQA models is promising, and that their implementation at large scale can narrow the problem of curating very digital video collections and lead to preservation and access decisions based on informed priorities. Introduction The use of video has become significant and pervasive in our daily lives, going beyond traditional education and entertainment functions into areas such as personal communications exchange, criminal evidence, surveillance, and marketing. With this functional diversity comes a variety of formats, including advancing compression, and editing mechanisms to facilitate video creation and distribution. The advancements in video technology are important to cultural institutions, responsible for documenting society and of preserving video collections. Over time, these video collections grow without bound, severely encumbering the curation task. Accordingly, collecting institutions realize that individual and manual inspection, a traditional approach to assessing video quality and making subsequent preservation and access decisions, is an insurmountable task. Instead, novel, reliable, and automated methods are required for this purpose. Motivated by the need to develop curation solutions for large and varied video collections, this project investigates the use of Image and Video Quality Assessment (I/VQA) algorithms to generate data-driven, perceptually relevant indicators of video quality levels for large video collections. I/VQA algorithms are designed to predict the subjective quality of a natural image or video that has been digitally acquired, processed, communicated and displayed as would be perceived and reported by users [1]. Currently, such algorithms are used to assess the quality of images and videos in streaming applications, and to dynamically correct their distortions. In this project we explore if and which I/VQA algorithms can be used to conduct large-scale automated assessment from which the need for more in depth video analysis can be prioritized. We conducted experiments to understand the adequacy/scope and to refine the I/VQA algorithm BRISQUE using a reference set of videos and a set of artistic videos as testbeds. All the experiments were run using High Performance Computing Resources (HPC). Running parallel computational processes on HPC systems allows generating results for individual frames per video in a collection, promptly and accurately within one workflow. Interpreting these results entailed a qualitative evaluation this is viewing videos with frame-level quality predictions along with a graph indicating a holistic measure of quality over an entire video. In the context of a digital curation project, experimenting with these algorithms in an HPC environment benefits from an interdisciplinary approach. A collaboration between the Laboratory for Image and Video Engineering (LIVE http://live.ece.utexas.edu), which conducts research in I/VQA, and the Texas Advanced Computing Center (TACC http://www.tacc.utexas.edu), which deploys computational resources for open science research, our team combines the expertise of data curators and computational scientists, with that of video engineers. In this paper we will introduce the I/VQA algorithms, explain how they compare to current methods to estimate video quality in heritage video collections, show the experiments conducted to understand the fitness of the model for video collections’ assessment, and discuss the results obtained from testing the model in reference video sets and in a regular video collection. I/VQA Algorithms State-of-the-art I/VQA algorithms are based on natural scene statistics (NSS), which function under the premise that scenes have statistical regularities. Because the human visual system is tuned to note regularities from irregularities, the statistics sensitive to these variations in regularity have been shown to correlate well with difference mean opinion scores (DMOS) of images and video. To successfully map these statistics to a single perceptual quality score, these algorithms train on both images and videos that have corresponding opinion scores. These DMOS scores are computed from a set of subjective evaluations obtained from humans watching sets of videos that have specific types and degrees of distortions. These videos are rated using a continuous sliding scale with the labels “Worst,” “Poor,” “Fair,” “Good,” and “Excellent.” 124 © 2015 Society for Imaging Science and Technology The user scores are combined to compute the DMOS score on the range of [0-100], where 0 is “Excellent” and 100 is “Worst.” These human scores are necessary for measuring the impact that different distortions have on perceptual quality [1]. I/VQA algorithms can be full-reference (FR) and no-reference (NR). The former require as input a high quality reference image or video against which a distorted copy can be compared to. In the context of curation, a FR algorithm, the Structural Similarity Index (SSIM), was used to verify if and to what degree the conversion of original video files involved information loss [2]. By contrast, NR algorithms measure the perceived quality in images and videos for which there is no original or pristine version available for comparison [1]. We propose that NR algorithms could be useful to understand a collection’s quality without the need for humans to review each video. But, studies have to be conducted to understand which models can be used to assess quality in video collections that are varied in content and distortions. The focus of this paper is evaluating if BRISQUE, a NR algorithm for image quality assessment that can be used to assess video, is appropriate for digital video curation. Related Work Collecting institutions have been traditionally focused on digitizing analogue video for preservation and access, and a number of video QC tools have been introduced for purposes of automatic and objective quality assessment of digitized files [3, 4]. This is a great improvement over the traditional approach in which humans reviewed the files to detect both errors originating in the analogue media that was digitized and errors resulting from the digitization process. Indeed, while humans can identify different types of video distortions, manually recording them with precision is extremely time consuming and inconsistent [5]. Aside from individual differences, popular QC tools identify various types of artifacts and noise in individual frames and across frame differences, producing frame-by-frame features [3] or averaged features [4] for each type of detected distortion. In turn, these results have to be interpreted to derive a holistic quality condition per video. Therefore, while these tools assist the curation task by a human, none of them eliminate the need for humans to view the videos. To accurately assess the condition of a video in a perceptually relevant context, these features must be mapped to a quality score which correlates significantly with human-based DMOS scores. Our work differs in methods and scope from the above, serving a complementary function. As opposed to detecting errors based on distortion-specific filters and corresponding ranges of normalcy, we are introducing perceptual subjective measures based on models of the human visual system to understand the quality of individual digital videos within collections. Importantly, the scores produced by the I/VQA algorithms are statistically significant through their correlation with the consensus scores obtained from people that have rated the distortions in reference video sets. Such consensus can be understood as the collective interpretation of quality. In addition, our project does not focus on detecting analogue distortions or on evaluating the results of the digitization process, but on distortions that are typical of compression algorithms. Because we are interested in processing large video collections, we run the model on a supercomputer allowing us to obtain DMOS predictions both holistically and at the per-frame scale. In addition, we also performed a study without training on rated distortions to remove subjectivity. In the following section we describe the testbed collections used to build and to evaluate our model, and the studies performed to determine its fitness to assess large-scale video collections conditions.