scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2014"


Journal ArticleDOI
TL;DR: It is found that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy-the standard deviation of the GMS map-can predict accurately perceptual image quality.
Abstract: It is an important task to faithfully evaluate the perceptual quality of output images in many applications, such as image compression, image restoration, and multimedia streaming. A good image quality assessment (IQA) model should not only deliver high quality prediction accuracy, but also be computationally efficient. The efficiency of IQA metrics is becoming particularly important due to the increasing proliferation of high-volume visual data in high-speed networks. We present a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). The image gradients are sensitive to image distortions, while different local structures in a distorted image suffer different degrees of degradations. This motivates us to explore the use of global variation of gradient based local quality map for overall image quality prediction. We find that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy-the standard deviation of the GMS map-can predict accurately perceptual image quality. The resulting GMSD algorithm is much faster than most state-of-the-art IQA methods, and delivers highly competitive prediction accuracy. MATLAB source code of GMSD can be downloaded at http://www4.comp.polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm.

1,211 citations


Journal ArticleDOI
TL;DR: It is found that SSEQ matches well with human subjective opinions of image quality, and is statistically superior to the full-reference IQA algorithm SSIM and several top-performing NR IQA methods: BIQI, DIIVINE, and BLIINDS-II.
Abstract: We develop an efficient general-purpose no-reference (NR) image quality assessment (IQA) model that utilizes local spatial and spectral entropy features on distorted images. Using a 2-stage framework of distortion classification followed by quality assessment, we utilize a support vector machine (SVM) to train an image distortion and quality prediction engine. The resulting algorithm, dubbed Spatial–Spectral Entropy-based Quality (SSEQ) index, is capable of assessing the quality of a distorted image across multiple distortion categories. We explain the entropy features used and their relevance to perception and thoroughly evaluate the algorithm on the LIVE IQA database. We find that SSEQ matches well with human subjective opinions of image quality, and is statistically superior to the full-reference (FR) IQA algorithm SSIM and several top-performing NR IQA methods: BIQI, DIIVINE, and BLIINDS-II. SSEQ has a considerably low complexity. We also tested SSEQ on the TID2008 database to ascertain whether it has performance that is database independent.

562 citations


Journal ArticleDOI
TL;DR: This work proposes a novel BIQA model that utilizes the joint statistics of two types of commonly used local contrast features: 1) the gradient magnitude (GM) map and 2) the Laplacian of Gaussian response.
Abstract: Blind image quality assessment (BIQA) aims to evaluate the perceptual quality of a distorted image without information regarding its reference image. Existing BIQA models usually predict the image quality by analyzing the image statistics in some transformed domain, e.g., in the discrete cosine transform domain or wavelet domain. Though great progress has been made in recent years, BIQA is still a very challenging task due to the lack of a reference image. Considering that image local contrast features convey important structural information that is closely related to image perceptual quality, we propose a novel BIQA model that utilizes the joint statistics of two types of commonly used local contrast features: 1) the gradient magnitude (GM) map and 2) the Laplacian of Gaussian (LOG) response. We employ an adaptive procedure to jointly normalize the GM and LOG features, and show that the joint statistics of normalized GM and LOG features have desirable properties for the BIQA task. The proposed model is extensively evaluated on three large-scale benchmark databases, and shown to deliver highly competitive performance with state-of-the-art BIQA models, as well as with some well-known full reference image quality assessment models.

535 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed NSS and motion coherency models are appropriate for quality assessment of videos, and they are utilized to design a blind VQA algorithm that correlates highly with human judgments of quality.
Abstract: We propose a blind (no reference or NR) video quality evaluation model that is nondistortion specific. The approach relies on a spatio-temporal model of video scenes in the discrete cosine transform domain, and on a model that characterizes the type of motion occurring in the scenes, to predict video quality. We use the models to define video statistics and perceptual features that are the basis of a video quality assessment (VQA) algorithm that does not require the presence of a pristine video to compare against in order to predict a perceptual quality score. The contributions of this paper are threefold. 1) We propose a spatio-temporal natural scene statistics (NSS) model for videos. 2) We propose a motion model that quantifies motion coherency in video scenes. 3) We show that the proposed NSS and motion coherency models are appropriate for quality assessment of videos, and we utilize them to design a blind VQA algorithm that correlates highly with human judgments of quality. The proposed algorithm, called video BLIINDS, is tested on the LIVE VQA database and on the EPFL-PoliMi video database and shown to perform close to the level of top performing reduced and full reference VQA algorithms.

383 citations


Journal ArticleDOI
TL;DR: The resulting algorithm, dubbed CurveletQA, correlates well with human subjective opinions of image quality, delivering performance that is competitive with popular full-reference IQA algorithms such as SSIM, and with top-performing NR IQA models.
Abstract: We study the efficacy of utilizing a powerful image descriptor, the curvelet transform, to learn a no-reference (NR) image quality assessment (IQA) model. A set of statistical features are extracted from a computed image curvelet representation, including the coordinates of the maxima of the log-histograms of the curvelet coefficients values, and the energy distributions of both orientation and scale in the curvelet domain. Our results indicate that these features are sensitive to the presence and severity of image distortion. Operating within a 2-stage framework of distortion classification followed by quality assessment, we train an image distortion and quality prediction engine using a support vector machine (SVM). The resulting algorithm, dubbed CurveletQA for short, was tested on the LIVE IQA database and compared to state-of-the-art NR/FR IQA algorithms. We found that CurveletQA correlates well with human subjective opinions of image quality, delivering performance that is competitive with popular full-reference (FR) IQA algorithms such as SSIM, and with top-performing NR IQA models. At the same time, CurveletQA has a relatively low complexity.

176 citations


Journal ArticleDOI
TL;DR: This paper presents a complex extension of the DIIVINE algorithm (called C-DIIVINE), which blindly assesses image quality based on the complex Gaussian scale mixture model corresponding to the complex version of the steerable pyramid wavelet transform.
Abstract: It is widely known that the wavelet coefficients of natural scenes possess certain statistical regularities which can be affected by the presence of distortions. The DIIVINE (Distortion Identification-based Image Verity and Integrity Evaluation) algorithm is a successful no-reference image quality assessment (NR IQA) algorithm, which estimates quality based on changes in these regularities. However, DIIVINE operates based on real-valued wavelet coefficients, whereas the visual appearance of an image can be strongly determined by both the magnitude and phase information. In this paper, we present a complex extension of the DIIVINE algorithm (called C-DIIVINE), which blindly assesses image quality based on the complex Gaussian scale mixture model corresponding to the complex version of the steerable pyramid wavelet transform. Specifically, we applied three commonly used distribution models to fit the statistics of the wavelet coefficients: (1) the complex generalized Gaussian distribution is used to model the wavelet coefficient magnitudes, (2) the generalized Gaussian distribution is used to model the [email protected]? relative magnitudes, and (3) the wrapped Cauchy distribution is used to model the [email protected]? relative phases. All these distributions have characteristic shapes that are consistent across different natural images but change significantly in the presence of distortions. We also employ the complex wavelet structural similarity index to measure degradation of the correlations across image scales, which serves as an important indicator of the [email protected]? energy distribution and the loss of alignment of local spectral components contributing to image structure. Experimental results show that these complex extensions allow C-DIIVINE to yield a substantial improvement in predictive performance as compared to its predecessor, and highly competitive performance relative to other recent no-reference algorithms.

106 citations


Journal ArticleDOI
TL;DR: A Hammerstein-Wiener model is presented for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos and it is shown that the model is able to reliably predict the TVSQ of rate adaptive videos.
Abstract: Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flexible rate-adaptation under varying channel conditions. Accurately predicting the users' quality of experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos. Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online TVSQ prediction in HTTP-based streaming.

97 citations


Journal ArticleDOI
TL;DR: A new 3D saliency prediction model that accounts for diverse low-level luminance, chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type is described.
Abstract: We describe a new 3D saliency prediction model that accounts for diverse low-level luminance, chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type. The model also accounts for perceptual factors, such as the nonuniform resolution of the human eye, stereoscopic limits imposed by Panum's fusional area, and the predicted degree of (dis) comfort felt, when viewing the 3D video. The high-level analysis involves classification of each 3D video scene by type with regard to estimated camera motion and the motions of objects in the videos. Decisions regarding the relative saliency of objects or regions are supported by data obtained through a series of eye-tracking experiments. The algorithm developed from the model elements operates by finding and segmenting salient 3D space-time regions in a video, then calculating the saliency strength of each segment using measured attributes of motion, disparity, texture, and the predicted degree of visual discomfort experienced. The saliency energy of both segmented objects and frames are weighted using models of human foveation and Panum's fusional area yielding a single predictor of 3D saliency.

95 citations


Journal ArticleDOI
TL;DR: The 3D-AVM Predictor accounts for anomalous motor responses of both accommodation and vergence, yielding predictive power that is statistically superior to prior models that rely on a computed disparity distribution only.
Abstract: To achieve clear binocular vision, neural processes that accomplish accommodation and vergence are performed via two collaborative, cross-coupled processes: accommodation-vergence (AV) and vergence-accommodation (VA). However, when people watch stereo images on stereoscopic displays, normal neural functioning may be disturbed owing to anomalies of the cross-link gains. These anomalies are likely the main cause of visual discomfort experienced when viewing stereo images, and are called Accommodation-Vergence Mismatches (AVM). Moreover, the absence of any useful accommodation depth cues when viewing 3D content on a flat panel (planar) display induces anomalous demands on binocular fusion, resulting in possible additional visual discomfort. Most prior efforts in this direction have focused on predicting anomalies in the AV cross-link using measurements on a computed disparity map. We further these contributions by developing a model that accounts for both accommodation and vergence, resulting in a new visual discomfort prediction algorithm dubbed the 3D-AVM Predictor. The 3D-AVM model and algorithm make use of a new concept we call local 3D bandwidth (BW) which is defined in terms of the physiological optics of binocular vision and foveation. The 3D-AVM Predictor accounts for anomalous motor responses of both accommodation and vergence, yielding predictive power that is statistically superior to prior models that rely on a computed disparity distribution only.

65 citations


Journal ArticleDOI
TL;DR: A new video quality model (VQM) that accounts for the perceptual impact of variable frame delays (VFD) in videos with demonstrated top performance on the laboratory for image and video engineering (LIVE) mobile video quality assessment ( VQA) database.
Abstract: We announce a new video quality model (VQM) that accounts for the perceptual impact of variable frame delays (VFD) in videos with demonstrated top performance on the laboratory for image and video engineering (LIVE) mobile video quality assessment (VQA) database. This model, called VQM_VFD, uses perceptual features extracted from spatial-temporal blocks spanning fixed angular extents and a long edge detection filter. VQM_VFD predicts video quality by measuring multiple frame delays using perception based parameters to track subjective quality over time. In the performance analysis of VQM_VFD, we evaluated its efficacy at predicting human opinions of visual quality. A detailed correlation analysis and statistical hypothesis testing show that VQM_VFD accurately predicts human subjective judgments and substantially outperforms top-performing image quality assessment and VQA models previously tested on the LIVE mobile VQA database. VQM_VFD achieved the best performance on the mobile and tablet studies of the LIVE mobile VQA database for simulated compression, wireless packet-loss, and rate adaptation, but not for temporal dynamics. These results validate the new model and warrant a hard release of the VQM_VFD algorithm. It is freely available for any purpose, commercial, or noncommercial at http://www.its.bldrdoc.gov/vqm/ .

60 citations


Proceedings ArticleDOI
05 Feb 2014
TL;DR: A new mobile video database that models distortions caused by network impairments and is making the database publicly available in order to help advance state-of-the-art research on user-centric mobile network planning and management.
Abstract: We have created a new mobile video database that models distortions caused by network impairments. In particular, we simulate stalling events and startup delays in over-the-top (OTT) mobile streaming videos. We describe the way we simulated diverse stalling events to create a corpus of distorted videos and the human study we conducted to obtain subjective scores. We also analyzed the ratings to understand the impact of several factors that influence the quality of experience (QoE). To the best of our knowledge, ours is the most comprehensive and diverse study on the effects of stalling events on QoE. We are making the database publicly available [1] in order to help advance state-of-the-art research on user-centric mobile network planning and management.

Proceedings ArticleDOI
05 Feb 2014
TL;DR: A novel natural-scene-statistics-based blind image quality assessment model that is created by training a deep belief net to discover good feature representations that are used to learn a regressor for quality prediction is presented.
Abstract: We present a novel natural-scene-statistics-based blind image quality assessment model that is created by training a deep belief net to discover good feature representations that are used to learn a regressor for quality prediction. The proposed deep model has an unsupervised pre-training stage followed by a supervised fine-tuning stage, enabling it to generalize over different distortion types, mixtures, and severities. We evaluated our new model on a recently created database of images afflicted by real distortions, and show that it outperforms current state-of-the-art blind image quality prediction models.

Journal ArticleDOI
TL;DR: Experimental results obtained show that the proposed SVC algorithm achieves high correlation against human judgments when assessing the blur distortion of images and is well-suited for real-time applications.

Journal ArticleDOI
03 Dec 2014-PLOS ONE
TL;DR: A fully automated microfluidic platform for performing laser axotomies of fluorescently tagged neurons in living Caenorhabditis elegans establishes a promising methodology for prospective genome-wide screening of nerve regeneration in C. elegans in a truly high-throughput manner.
Abstract: Femtosecond laser nanosurgery has been widely accepted as an axonal injury model, enabling nerve regeneration studies in the small model organism, Caenorhabditis elegans. To overcome the time limitations of manual worm handling techniques, automation and new immobilization technologies must be adopted to improve throughput in these studies. While new microfluidic immobilization techniques have been developed that promise to reduce the time required for axotomies, there is a need for automated procedures to minimize the required amount of human intervention and accelerate the axotomy processes crucial for high-throughput. Here, we report a fully automated microfluidic platform for performing laser axotomies of fluorescently tagged neurons in living Caenorhabditis elegans. The presented automation process reduces the time required to perform axotomies within individual worms to ∼17 s/worm, at least one order of magnitude faster than manual approaches. The full automation is achieved with a unique chip design and an operation sequence that is fully computer controlled and synchronized with efficient and accurate image processing algorithms. The microfluidic device includes a T-shaped architecture and three-dimensional microfluidic interconnects to serially transport, position, and immobilize worms. The image processing algorithms can identify and precisely position axons targeted for ablation. There were no statistically significant differences observed in reconnection probabilities between axotomies carried out with the automated system and those performed manually with anesthetics. The overall success rate of automated axotomies was 67.4±3.2% of the cases (236/350) at an average processing rate of 17.0±2.4 s. This fully automated platform establishes a promising methodology for prospective genome-wide screening of nerve regeneration in C. elegans in a truly high-throughput manner.

Journal ArticleDOI
TL;DR: A new framework for quantifying 3D visual information is proposed that is applied to the problem of predicting visual fatigue experienced when viewing 3D displays, and the 3DVA utilizes the empirical distortions of wavelet coefficients to a parametric generalized Gaussian probability distribution model and a set of 3D perceptual weights.
Abstract: One of the most challenging ongoing issues in the field of 3D visual research is how to perceptually quantify object and surface visualizations that are displayed within a virtual 3D space between a human eye and 3D display. To seek an effective method of quantification, it is necessary to measure various elements related to the perception of 3D objects at different depths. We propose a new framework for quantifying 3D visual information that we call 3D visual activity (3DVA), which utilizes natural scene statistics measured over 3D visual coordinates. We account for important aspects of 3D perception by carrying out a 3D coordinate transform reflecting the nonuniform sampling resolution of the eye and the process of stereoscopic fusion. The 3DVA utilizes the empirical distortions of wavelet coefficients to a parametric generalized Gaussian probability distribution model and a set of 3D perceptual weights. We conducted a series of simulations that demonstrate the effectiveness of the 3DVA for quantifying the statistical dynamics of visual 3D space with respect to disparity, motion, texture, and color. A successful example application is also provided, whereby 3DVA is applied to the problem of predicting visual fatigue experienced when viewing 3D displays.

Journal ArticleDOI
TL;DR: It is found that 3D quality of experience (QoE) assessment results obtained using MICSQ are more reliable over a wide dynamic range of content than obtained by the conventional single stimulus continuous quality evaluation (SSCQE) protocol.
Abstract: People experience a variety of 3D visual programs, such as 3D cinema, 3D TV and 3D games, making it necessary to deploy reliable methodologies for predicting each viewer's subjective experience. We propose a new methodology that we call multimodal interactive continuous scoring of quality (MICSQ). MICSQ is composed of a device interaction process between the 3D display and a separate device (PC, tablet, etc.) used as an assessment tool, and a human interaction process between the subject(s) and the separate device. The scoring process is multimodal, using aural and tactile cues to help engage and focus the subject(s) on their tasks by enhancing neuroplasticity. Recorded human responses to 3D visualizations obtained via MICSQ correlate highly with measurements of spatial and temporal activity in the 3D video content. We have also found that 3D quality of experience (QoE) assessment results obtained using MICSQ are more reliable over a wide dynamic range of content than obtained by the conventional single stimulus continuous quality evaluation (SSCQE) protocol. Moreover, the wireless device interaction process makes it possible for multiple subjects to assess 3D QoE simultaneously in a large space such as a movie theater, at different viewing angles and distances. We conducted a series of interesting 3D experiments showing the accuracy and versatility of the new system, while yielding new findings on visual comfort in terms of disparity, motion and an interesting relation between the naturalness and depth of field (DOF) of a stereo camera.

Journal ArticleDOI
TL;DR: This work has developed a no-reference framework for automatically predicting the perceptual quality of camera-shaken images based on their spectral statistics, and demonstrates the performance of an algorithm derived from these features on new and existing databases of images distorted by camera shake.
Abstract: The tremendous explosion of image-, video-, and audio-enabled mobile devices, such as tablets and smart-phones in recent years, has led to an associated dramatic increase in the volume of captured and distributed multimedia content. In particular, the number of digital photographs being captured annually is approaching 100 billion in just the U.S. These pictures are increasingly being acquired by inexperienced, casual users under highly diverse conditions leading to a plethora of distortions, including blur induced by camera shake. In order to be able to automatically detect, correct, or cull images impaired by shake-induced blur, it is necessary to develop distortion models specific to and suitable for assessing the sharpness of camera-shaken images. Toward this goal, we have developed a no-reference framework for automatically predicting the perceptual quality of camera-shaken images based on their spectral statistics. Two kinds of features are defined that capture blur induced by camera shake. One is a directional feature, which measures the variation of the image spectrum across orientations. The second feature captures the shape, area, and orientation of the spectral contours of camera shaken images. We demonstrate the performance of an algorithm derived from these features on new and existing databases of images distorted by camera shake.

Proceedings ArticleDOI
01 Nov 2014
TL;DR: A new image quality database that models diverse authentic image distortions and artifacts that affect images that are captured using modern mobile devices and a new online crowdsourcing system, which is using to conduct a very large-scale, on-going, multi-month image quality assessment (IQA) subjective study.
Abstract: We designed and created a new image quality database that models diverse authentic image distortions and artifacts that affect images that are captured using modern mobile devices. We also designed and implemented a new online crowdsourcing system, which we are using to conduct a very large-scale, on-going, multi-month image quality assessment (IQA) subjective study, wherein a wide range of diverse observers record their judgments of image quality. Our database currently consists of over 320,000 opinion scores on 1,163 authentically distorted images evaluated by over 7000 human observers. The new database will soon be made freely available for download and we envision that the fruits of our efforts will provide researchers with a valuable tool to benchmark and improve the performance of objective IQA algorithms.

Proceedings ArticleDOI
28 Jan 2014
TL;DR: This paper introduces an objective model called the delivery quality score (DQS) model, to predict user's QoE in the presence of such impairments, and demonstrates that the DQS model correlates highly with the subjective data and that it outperforms other emerging models.
Abstract: The vast majority of today's internet video services are consumed over-the-top (OTT) via reliable streaming (HTTP via TCP), where the primary noticeable delivery-related impairments are startup delay and stalling. In this paper we introduce an objective model called the delivery quality score (DQS) model, to predict user's QoE in the presence of such impairments. We describe a large subjective study that we carried out to tune and validate this model. Our experiments demonstrate that the DQS model correlates highly with the subjective data and that it outperforms other emerging models.

Journal ArticleDOI
TL;DR: A new set of features, called qualHOG, are proposed for robust facedetection that augments face-indicative Histogram of Oriented Gradients features with perceptual quality-aware spatial Natural Scene Statistics features, and provide statistically significant improvement in tolerance to image distortions over a strong baseline.
Abstract: Motivated by the proliferation of low-cost digital cameras in mobile devices being deployed in automated surveillance networks, we study the interaction between perceptual image quality and a classic computer vision task of face detection. We quantify the degradation in performance of a popular and effective face detector when human-perceived image quality is degraded by distortions commonly occurring in capture, storage, and transmission of facial images, including noise, blur, and compression. It is observed that, within a certain range of perceived image quality, a modest increase in image quality can drastically improve face detection performance. These results can be used to guide resource or bandwidth allocation in acquisition or communication/delivery systems that are associated with face detection tasks. A new set of features, called qualHOG, are proposed for robust face-detection that augments face-indicative Histogram of Oriented Gradients (HOG) features with perceptual quality-aware spatial Natural Scene Statistics (NSS) features. Face detectors trained on these new features provide statistically significant improvement in tolerance to image distortions over a strong baseline. Distortion-dependent and distortion-unaware variants of the face detectors are proposed and evaluated on a large database of face images representing a wide range of distortions. A biased variant of the training algorithm is also proposed that further enhances the robustness of these face detectors. To facilitate this research, we created a new distorted face database (DFD), containing face and non-face patches from images impaired by a variety of common distortion types and levels. This new data set and relevant code are available for download and further experimentation at www.live.ece.utexas.edu/research/Quality/index.htm.

Proceedings ArticleDOI
TL;DR: By utilizing robust bivariate models, this work is able to incorporate measurements of bivariate statistics between spatially adjacent luminance/chrominance and range information into various 3D image/video and computer vision applications, e.g., quality assessment, 2D-to-3D conversion, etc.
Abstract: The statistical properties embedded in visual stimuli from the surrounding environment guide and affect the evolutionary processes of human vision systems. There are strong statistical relationships between co-located luminance/chrominance and disparity bandpass coefficients in natural scenes. However, these statistical rela- tionships have only been deeply developed to create point-wise statistical models, although there exist spatial dependencies between adjacent pixels in both 2D color images and range maps. Here we study the bivariate statistics of the joint and conditional distributions of spatially adjacent bandpass responses on both luminance/chrominance and range data of naturalistic scenes. We deploy bivariate generalized Gaussian distributions to model the underlying statistics. The analysis and modeling results show that there exist important and useful statistical properties of both joint and conditional distributions, which can be reliably described by the corresponding bivariate generalized Gaussian models. Furthermore, by utilizing these robust bivariate models, we are able to incorporate measurements of bivariate statistics between spatially adjacent luminance/chrominance and range information into various 3D image/video and computer vision applications, e.g., quality assessment, 2D-to-3D conversion, etc.

Proceedings ArticleDOI
TL;DR: A perceptual fog density prediction model based on natural scene statistics and “fog aware” statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, and without training on human-rated judgments.
Abstract: We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and “fog aware” statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, without training on human-rated judgments, and without dependency on salient objects such as lane markings or traffic signs. The proposed fog density predictor only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. A fog aware collection of statistical features is derived from a corpus of foggy and fog-free images by using a space domain NSS model and observed characteristics of foggy images such as low contrast, faint color, and shifted intensity. The proposed model not only predicts perceptual fog density for the entire image but also provides a local fog density index for each patch. The predicted fog density of the model correlates well with the measured visibility in a foggy scene as measured by judgments taken in a human subjective study on a large foggy image database. As one application, the proposed model accurately evaluates the performance of defog algorithms designed to enhance the visibility of foggy images.

Journal ArticleDOI
TL;DR: Two new general blind image quality assessment (IQA) indices that respectively use the area and curvature of image reciprocal singular value curves are described, which can handle multiple unknown distortions and are no-training methods.
Abstract: The reciprocal singular value curves of natural images resemble inverse power functions. The bending degree of the reciprocal singular value curve varies with distortion type and severity. We describe two new general blind image quality assessment (IQA) indices that respectively use the area and curvature of image reciprocal singular value curves. These two methods almost require very little prior knowledge of any image or distortion nor any process of training, and they can handle multiple unknown distortions, hence they are no-training methods. Experimental results on five simulated databases show that the proposed algorithms deliver quality predictions that have high correlation with human subjective judgments, and that are competitive with other blind IQA models. We found out relationship between image distortion and reciprocal singular value curve.We constructed two new general blind image quality assessment (IQA) indices that respectively use the area and curvature of image reciprocal singular value curves.The proposed indices have the following advantages. (1) Simple mathematical expression leads to low computational complexity; (2) they can be applied to more distorted categories, such as "High frequency noise," and "WN-color."

Proceedings ArticleDOI
14 Jul 2014
TL;DR: The experimental results show that the combination of binocular contrast, structural dissimilarity and average luminance exhibits high consistency with subjective scores of visual discomfort, fusion difficulty and overall binocular mismatches in terms of Spearman's Rank Ordered Correlation Coefficient.
Abstract: Luminance discrepancies between image pairs occur owing to inconsistent parameters between stereoscopic camera devices and from imperfect capture conditions. Such discrepancies induce binocular mismatches and affect the visual comfort that is felt by viewers, as well as their ability to fuse stereoscopic. To better understand and observe this effect, we built a stereoscopic images database of 240 luminance discrepancy images and 30 natural images with subjective scores of visual discomfort and fusion difficulty. Two features, binocular contrast and luminance similarity were extracted to analyze the relationship between the subjective scores and the luminance discrepancies. Structural dissimilarity and average luminance are used to predict the effects of binocular mismatches. The experimental results show that the combination of binocular contrast, structural dissimilarity and average luminance exhibits high consistency with subjective scores of visual discomfort, fusion difficulty and overall binocular mismatches in terms of Spearman's Rank Ordered Correlation Coefficient.

Proceedings ArticleDOI
06 Apr 2014
TL;DR: The proposed defog and visibility enhancer makes use of statistical regularities observed in foggy and fog-free images to extract the most visible information from three processed image results: one white balanced and two contrast enhanced images.
Abstract: We propose a referenceless perceptual defog and visibility enhancement model based on multiscale “fog aware” statistical features. Our model operates on a single foggy image and uses a set of “fog aware” weight maps to improve the visibility of foggy regions. The proposed defog and visibility enhancer makes use of statistical regularities observed in foggy and fog-free images to extract the most visible information from three processed image results: one white balanced and two contrast enhanced images. Perceptual fog density, fog aware luminance, contrast, saturation, chrominance, and saliency weight maps smoothly blend these via a Laplacian pyramid. Evaluation on a variety of foggy images shows that the proposed model achieves better results for darker, denser foggy images as well as on standard defog test images.

Proceedings ArticleDOI
06 Apr 2014
TL;DR: The application of modern blind IQA models are extended to study whether quality prediction on other image modality can find practical use.
Abstract: Recent work on the problem of Image Quality Assessment (IQA) has produced accurate subjective quality evaluators for visible light images. Two such algorithms are the Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE) and the Natural Image Quality Evaluator (NIQE). Both models are useful in that they correlate highly with human visual perception of image quality. Given that other kinds of non-visible light images are also 'natural' projections of the world, and can be distorted thereby reducing the perceived quality, it is of interest to study whether quality prediction on other image modality can find practical use. To this end we have extended the application of modern blind IQA models.

Journal ArticleDOI
TL;DR: A simple filter-based model is created that successfully captures the psychophysical data over a wide range of velocities and flicker frequencies and finds that the threshold of silencing occurs when the log frequency of object replacement is roughly one quarter of the log flicker frequency.
Abstract: Motion can impair the perception of other visual changes. Suchow and Alvarez (2011a, Current Biology, 21, 140-143) recently demonstrated a striking 'motion silencing' illusion, in which the salient changes among a group of objects' luminances (or colors, etc) appear to cease in the presence of large, coherent object motion. To understand why the visual system might be insensitive to changes in object luminances ('flicker') in the presence of object motion, we constructed similar stimuli and did a systematic spectral analysis of them. We conducted human psychophysical experiments to examine motion silencing as a function of stimulus velocity, flicker frequency, and spacing; and we created a simple filter-based model as a working hypothesis of motion silencing. From the results, we found that the threshold of silencing occurs when the log frequency of object replacement is roughly one quarter of the log flicker frequency (the mean slope is approximately 0.27). The dependence of silencing on object spacing may be explained as a phenomenon of temporal sampling of the stimuli by the visual system. Our proposed model successfully captures the psychophysical data over a wide range of velocities and flicker frequencies.

Journal ArticleDOI
TL;DR: The results from this study suggest that radiologists who can perceive stereo can reliably interpret breast tomosynthesis projection images using stereoscopic viewing.
Abstract: The purpose of this study was to evaluate stereoscopic perception of low-dose breast tomosynthesis projection images. In this Institutional Review Board exempt study, craniocaudal breast tomosynthesis cases (N = 47), consisting of 23 biopsy-proven malignant mass cases and 24 normal cases, were retrospectively reviewed. A stereoscopic pair comprised of two projection images that were ±4° apart from the zero angle projection was displayed on a Planar PL2010M stereoscopic display (Planar Systems, Inc., Beaverton, OR, USA). An experienced breast imager verified the truth for each case stereoscopically. A two-phase blinded observer study was conducted. In the first phase, two experienced breast imagers rated their ability to perceive 3D information using a scale of 1–3 and described the most suspicious lesion using the BI-RADS® descriptors. In the second phase, four experienced breast imagers were asked to make a binary decision on whether they saw a mass for which they would initiate a diagnostic workup or not and also report the location of the mass and provide a confidence score in the range of 0–100. The sensitivity and the specificity of the lesion detection task were evaluated. The results from our study suggest that radiologists who can perceive stereo can reliably interpret breast tomosynthesis projection images using stereoscopic viewing.

Journal ArticleDOI
TL;DR: The best methods are shown to accurately predict subjective opinions of the quality of printed photographs using data from a psychometric study.
Abstract: Measuring the visual quality of printed media is important since printed products have an important role in everyday life. Finding ways to automatically predict the image quality has been an active research topic in digital image processing, but adapting those methods to measure the visual quality of printed media has not been studied often or in depth and is not straightforward. Here, we analyze the efficacy of no-reference image quality assessment (IQA) algorithms originally developed for digital IQA with regards to predicting the perceived quality of printed natural images. We perform a comprehensive statistical comparison of the methods. The best methods are shown to accurately predict subjective opinions of the quality of printed photographs using data from a psychometric study. © The Authors. Published by SPIE under a Creative Commons Attribution 3.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. (DOI: 10 .1117/1.JEI.23.6.061106)

Proceedings ArticleDOI
28 Jan 2014
TL;DR: A rate-adaptation algorithm is proposed that can incorporate QoE constraints on the empirical cumulative quality distribution per user and can reduce network resource consumption over conventional average-quality maximized rate- Adaptation algorithms.
Abstract: We conducted a subjective study wherein we found that viewers' Quality of Experience (QoE) was strongly correlated with the empirical cumulative distribution function (eCDF) of the predicted video quality. Based on this observation, we propose a rate-adaptation algorithm that can incorporate QoE constraints on the empirical cumulative quality distribution per user. Simulation results show that the proposed technique can reduce network resource consumption by 29% over conventional average-quality maximized rate-adaptation algorithms.