scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Proceedings ArticleDOI
14 Mar 2010
TL;DR: This paper presents a design of real-time implementable full-reference image quality algorithms based on the SSIM index and multi-scale SSIM (MS-SSIM) index that are tested on the LIVE image quality database and shown to yield performance commensurate with SSIM and MS- SSIM but with much lower computational complexity.
Abstract: The development of real-time image quality assessment algorithms is an important direction on which little research has focused. This paper presents a design of real-time implementable full-reference image quality algorithms based on the SSIM index [2] and multi-scale SSIM (MS-SSIM) index [3]. The proposed algorithms, which modify SSIM/MS-SSIM to achieve speed, are tested on the LIVE image quality database [13] and shown to yield performance commensurate with SSIM and MS-SSIM but with much lower computational complexity.

33 citations

Journal ArticleDOI
TL;DR: A novel FR-IQA framework that dynamically generates receptive fields responsive to distortion type is proposed that achieves state-of-the-art prediction accuracy on various open IQA databases.
Abstract: Most full-reference image quality assessment (FR-IQA) methods advanced to date have been holistically designed without regard to the type of distortion impairing the image. However, the perception of distortion depends nonlinearly on the distortion type. Here we propose a novel FR-IQA framework that dynamically generates receptive fields responsive to distortion type. Our proposed method- dynamic receptive field generation based image quality assessor (DRF-IQA)-separates the process of FR-IQA into two streams: 1) dynamic error representation and 2) visual sensitivity-based quality pooling. The first stream generates dynamic receptive fields on the input distorted image, implemented by a trained convolutional neural network (CNN), then the generated receptive field profiles are convolved with the distorted and reference images, and differenced to produce spatial error maps. In the second stream, a visual sensitivity map is generated. The visual sensitivity map is used to weight the spatial error map. The experimental results show that the proposed model achieves state-of-the-art prediction accuracy on various open IQA databases.

33 citations

Proceedings ArticleDOI
TL;DR: A deep belief network is designed that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction.
Abstract: Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

33 citations

Journal ArticleDOI
TL;DR: This model takes into account important aspects of video compression such as transform coding, motion compensation, and variable length coding and estimates distortion within 1.5 dB of actual simulation values in terms of peak-signal-to-noise ratio.
Abstract: Multimedia communication has become one of the main applications in commercial wireless systems. Multimedia sources, mainly consisting of digital images and videos, have high bandwidth requirements. Since bandwidth is a valuable resource, it is important that its use should be optimized for image and video communication. Therefore, interest in developing new joint source-channel coding (JSCC) methods for image and video communication is increasing. Design of any JSCC scheme requires an estimate of the distortion at different source coding rates and under different channel conditions. The common approach to obtain this estimate is via simulations or operational rate-distortion curves. These approaches, however, are computationally intensive and, hence, not feasible for real-time coding and transmission applications. A more feasible approach to estimate distortion is to develop models that predict distortion at different source coding rates and under different channel conditions. Based on this idea, we present a distortion model for estimating the distortion due to quantization and channel errors in MPEG-4 compressed video streams at different source coding rates and channel bit error rates. This model takes into account important aspects of video compression such as transform coding, motion compensation, and variable length coding. Results show that our model estimates distortion within 1.5 dB of actual simulation values in terms of peak-signal-to-noise ratio.

33 citations

Proceedings ArticleDOI
25 Mar 2002
TL;DR: The discrimination image paradigm and principal component analysis are presented, providing valuable low-level criteria for executing human-like scanpaths in such machine vision systems.
Abstract: In this paper, we present two techniques to reveal image features that attract the eye during visual search: the discrimination image paradigm and principal component analysis. In preliminary experiments, we employed these techniques to identify image features used to identify simple targets embedded in 1/ƒ noise. Two main findings emerged. First, the loci of fixations were not random but were driven by local image features, even in very noisy displays. Second, subjects often searched for a component feature of a target rather that the target itself, even if the target was a simple geometric form. Moreover, the particular relevant component varied from individual to individual. Also, principal component analysis of the noise patches at the point of fixation reveals global image features used by the subject in the search task. In addition to providing insight into the human visual system, these techniques have relevance for machine vision as well. The efficacy of a foveated machine vision system largely depends on its ability to actively select 'visually interesting' regions in its environment. The techniques presented in this paper provide valuable low-level criteria for executing human-like scanpaths in such machine vision systems.

33 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations