scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Book ChapterDOI
01 Jan 2018
TL;DR: In this chapter, a systematic framework for optimization with respect to a perceptual quality assessment algorithm is presented and the Structural SIMilarity (SSIM) index is the representative image quality assessment model that is studied.
Abstract: The fact that multimedia services have become the major driver for next generation wireless networks underscores their technological and economic impact. A vast majority of these multimedia services are consumer-centric and therefore must guarantee a certain level of perceptual quality. Given the massive volumes of image and video data in question, it is only natural to adopt automatic quality prediction and optimization tools. The past decade has seen the invention of several excellent automatic quality prediction tools for natural images and videos. While these tools predict perceptual quality scores accurately, they do not necessarily lend themselves to standard optimization techniques. In this chapter, a systematic framework for optimization with respect to a perceptual quality assessment algorithm is presented. The Structural SIMilarity (SSIM) index, which has found vast commercial acceptance owing to its high performance and low complexity, is the representative image quality assessment model that is studied. Specifically, a detailed exposition of the mathematical properties of the SSIM index is presented first, followed by a discussion on the design of linear and non-linear SSIM-optimal image restoration algorithms.

14 citations

Journal ArticleDOI
TL;DR: In this paper, a generalized Gaussian distribution (GGD) is used to model band-pass responses, while entropy variations between reference and distorted videos under the GGD model are used to capture video quality variations arising from frame rate changes.
Abstract: We consider the problem of conducting frame rate dependent video quality assessment (VQA) on videos of diverse frame rates, including high frame rate (HFR) videos. More generally, we study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. A generalized Gaussian distribution (GGD) is used to model band-pass responses, while entropy variations between reference and distorted videos under the GGD model are used to capture video quality variations arising from frame rate changes. The entropic differences are calculated across multiple temporal and spatial subbands, and merged using a learned regressor. We show through extensive experiments that GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models. The features used in GREED are highly generalizable and obtain competitive performance even on standard, non-HFR VQA databases. The implementation of GREED has been made available online: https://github.com/pavancm/GREED .

14 citations

Journal ArticleDOI
TL;DR: In this article, the LIVE-NFLX-II database contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content.
Abstract: Measuring Quality of Experience (QoE) and integrating these measurements into video streaming algorithms is a multi-faceted problem that fundamentally requires the design of comprehensive subjective QoE databases and objective QoE prediction models. To achieve this goal, we have recently designed the LIVE-NFLX-II database, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content. Our database builds on recent advancements in content-adaptive encoding and incorporates actual network traces to capture realistic network variations on the client device. The new database focuses on low bandwidth conditions which are more challenging for bitrate adaptation algorithms, which often must navigate tradeoffs between rebuffering and video quality. Using our database, we study the effects of multiple streaming dimensions on user experience and evaluate video quality and quality of experience models and analyze their strengths and weaknesses. We believe that the tools introduced here will help inspire further progress on the development of perceptually-optimized client adaptation and video streaming strategies. The database is publicly available at http://live.ece.utexas.edu/research/LIVE_NFLX_II/live_nflx_plus.html .

14 citations

Journal ArticleDOI
TL;DR: This work proposes a new closed-form spatial-oriented correlation model that captures statistical regularities between perceptually decomposed natural image luminance samples and validate the new correlation model on a variety of natural images.
Abstract: Most prevalent statistical models of natural images characterize only the univariate distributions of divisively normalized bandpass responses or wavelet-like decompositions of them. However, the higher-order dependencies between spatially neighboring responses are not yet well understood. Towards filling this gap, we propose a new closed-form spatial-oriented correlation model that captures statistical regularities between perceptually decomposed natural image luminance samples. We validate the new correlation model on a variety of natural images. Experimental results demonstrate the robustness of the new correlation model across image content. A software release that implements the new closed-form spatial-oriented correlation model is available at http://live.ece.utexas.edu/research/3dnss/bicorr_release.zip.

14 citations

Journal ArticleDOI
TL;DR: This article has studied and analyzed the statistics of both pristine and distorted bandpass X-ray images, and devised an application of NSS models to an image modality classification task, whereby VL, X-rays, infrared, and millimeter-wave images can be effectively and automatically distinguished.
Abstract: In this article, we have studied and analyzed the statistics of both pristine and distorted bandpass X-ray images. In the past, we have shown that the statistics of natural, bandpass-filtered visible light (VL) pictures, commonly expressed by natural scene statistic (NSS) models, can be used to create remarkably powerful, perceptually relevant predictors of perceptual picture quality. We find that similar models can be developed that apply quite well to X-ray image data. We have also studied the potential of applying these statistical X-ray NSS models to the design of algorithms for automatic image quality prediction of X-ray images, such as might occur in security, medicine, and material inspection applications. As a demonstration of the discrimination power of these models, we devised an application of NSS models to an image modality classification task, whereby VL, X-ray, infrared, and millimeter-wave images can be effectively and automatically distinguished. Our study is conducted on a dataset of X-ray images made available by the National Institute of Standards and Technology.

14 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations