scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Journal ArticleDOI
TL;DR: A model incorporating a novel luminance flicker detector is developed that posit that, given limited resources to detect temporal change, all temporal change is interpreted as motion when a certain amount of actual motion exists.
Abstract: Number: 31 Submitted By: Lark Kwon Choi Last Modified: November 22 2011 A Flicker Detector Model of the Motion Silencing Illusion Lark Kwon Choi1,3, Alan Conrad Bovik1,3, Lawrence Kevin Cormack2,3 1Department of Electrical and Computer Engineering, The University of Texas at Austin 2Department of Psychology, The University of Texas at Austin 3Center for Perceptual Systems, The University of Texas at Austin The perception of motion and change are important mechanisms in a visual system. Suchow and Alvarez recently presented a \"motion silencing\" illusion, in which salient flicker (spatially localized repetitive changes in luminance, color, shape, or size) become undetectable in the presence of rapid motion. They also proposed a \"misattribution\" hypothesis, which we interpret to mean that, when there is an actual motion signal, the dynamic signal from the flicker is misattributed to the motion stimulus, and hence no flicker is perceived. In an attempt to understand this phenomenon, we have developed a model incorporating a novel luminance flicker detector. We conducted experiments examining the relationship between rotational velocity (RV) and change rate (CR). We also did a systematic spectral analysis of the stimuli over a wide range of flicker and rotation rates. We then used the distributions of the spectral signatures of the dynamically changing stimuli to develop a computational model of silencing under the assumption that there is a motion energy threshold beyond which all temporal energy is attributed to motion. The model accurately captures the quantitative relationship between RV and CR for silencing, in which linear regression parameters are almost identical between humans and the model. This implies the misattribution hypothesis is likely correct. Specifically, we posit that, given limited resources to detect temporal change, all temporal change is interpreted as motion when a certain amount of actual motion exists. This is understandable in an ecological context because the probable consequences of ignoring true motion (a \"miss\") are likely much greater than misinterpreting flicker as motion (a \"false alarm\") given the relative rarity and importance of stationary flickering stimuli in the natural world. Printable Preview http://www.visionsciences1.org/vss_public/core_routines/print_preview.php?abstractno=31

13 citations

Journal ArticleDOI
TL;DR: The conspicuity maps generated are more consistent with human fixations than prior state-of-the-art models when tested on color image datasets and should prove useful for such diverse image processing tasks as quality assessment, segmentation, search, or compression.
Abstract: We propose an image conspicuity index that combines three factors: spatial dissimilarity, spatial distance and central bias. The dissimilarity between image patches is evaluated in a reduced dimensional principal component space and is inversely weighted by the spatial separations between patches. An additional weighting mechanism is deployed that reflects the bias of human fixations towards the image center. The method is tested on three public image datasets and a video clip to evaluate its performance. The experimental results indicate highly competitive performance despite the simple definition of the proposed index. The conspicuity maps generated are more consistent with human fixations than prior state-of-the-art models when tested on color image datasets. This is demonstrated using both receiver operator characteristics (ROC) analysis and the Kullback-Leibler distance metric. The method should prove useful for such diverse image processing tasks as quality assessment, segmentation, search, or compression. The high performance and relative simplicity of the conspicuity index relative to other much more complex models suggests that it may find wide usage.

13 citations

Journal ArticleDOI
TL;DR: This paper develops models that predict the effect of image quality on the detection of the improvised explosive device components by bomb technicians in images taken using portable X-ray systems and develops a new suite of statistical task prediction models that are believed to be the first NSS-based model for securityX-ray images.
Abstract: Developing methods to predict how image quality affects the task performance is a topic of great interest in many applications. While such studies have been performed in the medical imaging community, little work has been reported in the security X-ray imaging literature. In this paper, we develop models that predict the effect of image quality on the detection of the improvised explosive device components by bomb technicians in images taken using portable X-ray systems. Using a newly developed NIST-LIVE X-Ray Task Performance Database, we created a set of objective algorithms that predict bomb technician detection performance based on the measures of image quality. Our basic measures are traditional image quality indicators (IQIs) and perceptually relevant natural scene statistics (NSS)-based measures that have been extensively used in visible light image quality prediction algorithms. We show that these measures are able to quantify the perceptual severity of degradations and can predict the performance of expert bomb technicians in identifying threats. Combining NSS- and IQI-based measures yields even better task performance prediction than either of these methods independently. We also developed a new suite of statistical task prediction models that we refer to as quality inspectors of X-ray images (QUIX); we believe this is the first NSS-based model for security X-ray images. We also show that QUIX can be used to reliably predict conventional IQI metric values on the distorted X-ray images.

13 citations

Proceedings ArticleDOI
14 Jul 2014
TL;DR: The experimental results show that the combination of binocular contrast, structural dissimilarity and average luminance exhibits high consistency with subjective scores of visual discomfort, fusion difficulty and overall binocular mismatches in terms of Spearman's Rank Ordered Correlation Coefficient.
Abstract: Luminance discrepancies between image pairs occur owing to inconsistent parameters between stereoscopic camera devices and from imperfect capture conditions. Such discrepancies induce binocular mismatches and affect the visual comfort that is felt by viewers, as well as their ability to fuse stereoscopic. To better understand and observe this effect, we built a stereoscopic images database of 240 luminance discrepancy images and 30 natural images with subjective scores of visual discomfort and fusion difficulty. Two features, binocular contrast and luminance similarity were extracted to analyze the relationship between the subjective scores and the luminance discrepancies. Structural dissimilarity and average luminance are used to predict the effects of binocular mismatches. The experimental results show that the combination of binocular contrast, structural dissimilarity and average luminance exhibits high consistency with subjective scores of visual discomfort, fusion difficulty and overall binocular mismatches in terms of Spearman's Rank Ordered Correlation Coefficient.

13 citations

Proceedings ArticleDOI
06 Apr 2014
TL;DR: The proposed defog and visibility enhancer makes use of statistical regularities observed in foggy and fog-free images to extract the most visible information from three processed image results: one white balanced and two contrast enhanced images.
Abstract: We propose a referenceless perceptual defog and visibility enhancement model based on multiscale “fog aware” statistical features. Our model operates on a single foggy image and uses a set of “fog aware” weight maps to improve the visibility of foggy regions. The proposed defog and visibility enhancer makes use of statistical regularities observed in foggy and fog-free images to extract the most visible information from three processed image results: one white balanced and two contrast enhanced images. Perceptual fog density, fog aware luminance, contrast, saturation, chrominance, and saliency weight maps smoothly blend these via a Laplacian pyramid. Evaluation on a variety of foggy images shows that the proposed model achieves better results for darker, denser foggy images as well as on standard defog test images.

13 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations