scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Journal ArticleDOI
TL;DR: An entropy minimization algorithm is derived and it is found that it performs optimally at reducing total contrast uncertainty and that it also works well at reducing the mean squared error between the original image and the image reconstructed from the multiple fixations.
Abstract: The human visual system combines a wide field of view with a high-resolution fovea and uses eye, head, and body movements to direct the fovea to potentially relevant locations in the visual scene. This strategy is sensible for a visual system with limited neural resources. However, for this strategy to be effective, the visual system needs sophisticated central mechanisms that efficiently exploit the varying spatial resolution of the retina. To gain insight into some of the design requirements of these central mechanisms, we have analyzed the effects of variable spatial resolution on local contrast in 300 calibrated natural images. Specifically, for each retinal eccentricity (which produces a certain effective level of blur), and for each value of local contrast observed at that eccentricity, we measured the probability distribution of the local contrast in the unblurred image. These conditional probability distributions can be regarded as posterior probability distributions for the “true” unblurred contrast, given an observed contrast at a given eccentricity. We find that these conditional probability distributions are adequately described by a few simple formulas. To explore how these statistics might be exploited by central perceptual mechanisms, we consider the task of selecting successive fixation points, where the goal on each fixation is to maximize total contrast information gained about the image (i.e., minimize total contrast uncertainty). We derive an entropy minimization algorithm and find that it performs optimally at reducing total contrast uncertainty and that it also works well at reducing the mean squared error between the original image and the image reconstructed from the multiple fixations. Our results show that measurements of local contrast alone could efficiently drive the scan paths of the eye when the goal is to gain as much information about the spatial structure of a scene as possible.

53 citations

Journal ArticleDOI
TL;DR: The magnitudes of luminance and range (disparity) coefficients show a clear positive correlation, which means, at a location with larger luminance variation, there is a higher probability of a larger range (distribution) variation.
Abstract: We studied the empirical distributions of luminance, range and disparity wavelet coefficients using a coregistered database of luminance and range images. The marginal distributions of range and disparity are observed to have high peaks and heavy tails, similar to the well-known properties of luminance wavelet coefficients. However, we found that the kurtosis of range and disparity coefficients is significantly larger than that of luminance coefficients. We used generalized Gaussian models to fit the empirical marginal distributions. We found that the marginal distribution of luminance coefficients have a shape parameter p between 0.6 and 0.8, while range and disparity coefficients have much smaller parameters p <; 0.32, corresponding to a much higher peak. We also examined the conditional distributions of luminance, range and disparity coefficients. The magnitudes of luminance and range (disparity) coefficients show a clear positive correlation, which means, at a location with larger luminance variation, there is a higher probability of a larger range (disparity) variation. We also used generalized Gaussians to model the conditional distributions of luminance and range (disparity) coefficients. The values of the two shape parameters (p,s) reflect the observed luminance-range (disparity) dependency. As an example of the usefulness of luminance statistics conditioned on range statistics, we modified a well-known Bayesian stereo ranging algorithm using our natural scene statistics models, which improved its performance.

53 citations

Journal ArticleDOI
TL;DR: This work proposes a first of a kind continuous QoE prediction engine based on a nonlinear autoregressive model with exogenous outputs that is driven by an objective measure of perceptual video quality, rebuffering-aware information, and aQoE memory descriptor that accounts for recency.
Abstract: Streaming video data accounts for a large portion of mobile network traffic Given the throughput and buffer limitations that currently affect mobile streaming, compression artifacts and rebuffering events commonly occur Being able to predict the effects of these impairments on perceived video quality of experience (QoE) could lead to improved resource allocation strategies enabling the delivery of higher quality video Toward this goal, we propose a first of a kind continuous QoE prediction engine Prediction is based on a nonlinear autoregressive model with exogenous outputs Our QoE prediction model is driven by three QoE-aware inputs: An objective measure of perceptual video quality, rebuffering-aware information, and a QoE memory descriptor that accounts for recency We evaluate our method on a recent QoE dataset containing continuous time subjective scores

53 citations

Proceedings ArticleDOI
07 Jun 2004
TL;DR: This work discovered CI templates that indeed resembled the target by analyzing the stimulus at the point of gaze using the classification image (CI) paradigm, and demonstrated that these CI templates are useful in predicting stimulus regions that draw human fixations in search tasks.
Abstract: Seemingly complex tasks like visual search can be analyzed using a cognition-free, bottom-up framework. We sought to reveal strategies used by observers in visual search tasks using accurate eye tracking and image analysis at point of gaze. Observers were instructed to search for simple geometric targets embedded in 1=f noise. By analyzing the stimulus at the point of gaze using the classification image (CI) paradigm, we discovered CI templates that indeed resembled the target. No such structure emerged for a random-searcher. We demonstrate, qualitatively and quantitatively, that these CI templates are useful in predicting stimulus regions that draw human fixations in search tasks. Filtering a 1=f noise stimulus with a CI results in a ‘fixation prediction map’. A qualitative evaluation of the prediction was obtained by overlaying k-means clusters of observers’ fixations on the prediction map. The fixations clustered around the local maxima in the prediction map. To obtain a quantitative comparison, we computed the Kullback-Leibler distance between the recorded fixations and the prediction. Using random-searcher CIs in Monte Carlo simulations, a distribution of this distance was obtained. The z-scores for the human CIs and the original target were -9.70 and -9.37 respectively indicating that even in noisy stimuli, observers deploy their fixations eciently to likely targets rather than casting them randomly hoping to fortuitously find the target.

52 citations

Proceedings ArticleDOI
13 Nov 1994
TL;DR: The ADP has a superior ability to subdivide the image into integral groupings, minimizing the error in boundary localization and in pixel intensity, and an application to segmentation of remotely sensed data is provided.
Abstract: We introduce the Anisotropic Diffusion Pyramid (ADP), a structure for multiresolution image processing. We also develop the ADP for use in region-based segmentation. The pyramid is constructed using the anisotropic diffusion equations, creating an efficient scale-space representation. Segmentation is accomplished using pyramid node linking. Since anisotropic diffusion preserves edge localization as the scale is increased, the region boundaries in the coarse-to-fine ADP segmentation are accurately delineated. An application to segmentation of remotely sensed data is provided. The results of ADP segmentation are compared to Gaussian-based pyramidal segmentation. The examples show that the ADP has a superior ability to subdivide the image into integral groupings, minimizing the error in boundary localization and in pixel intensity. >

52 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations