scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper discusses the vision for the future of visual quality assessment research, introduces the area of quality assessment and state its relevance, and describes current standards for gauging algorithmic performance and define terms that will be used through this paper.
Abstract: Creating algorithms capable of predicting the perceived quality of a visual stimulus defines the field of objective visual quality assessment (QA). The field of objective QA has received tremendous attention in the recent past, with many successful algorithms being proposed for this purpose. Our concern here is not with the past however; in this paper we discuss our vision for the future of visual quality assessment research. We first introduce the area of quality assessment and state its relevance. We describe current standards for gauging algorithmic performance and define terms that we will use through this paper. We then journey through 2D image and video quality assessment. We summarize recent approaches to these problems and discuss in detail our vision for future research on the problems of full-reference and no-reference 2D image and video quality assessment. From there, we move on to the currently popular area of 3D QA. We discuss recent databases, algorithms and 3D quality of experience. This yet-nascent technology provides for tremendous scope in terms of research activities and we summarize each of them. We then move on to more esoteric topics such as algorithmic assessment of aesthetics in natural images and in art. We discuss current research and hypothesize about possible paths to tread. Towards the end of this article, we discuss some other areas of interest including high-definition (HD) quality assessment, immersive environments and so on before summarizing interesting avenues for future work in multimedia (i.e., audio-visual) quality assessment.

119 citations

Journal ArticleDOI
TL;DR: This paper linearizes error diffusion algorithms by modeling the quantizer as a linear gain plus additive noise, and quantifies the two primary effects of error diffusion: edge sharpening and noise shaping.
Abstract: Digital halftoning quantizes a graylevel image to one bit per pixel. Halftoning by error diffusion reduces local quantization error by filtering the quantization error in a feedback loop. In this paper, we linearize error diffusion algorithms by modeling the quantizer as a linear gain plus additive noise. We confirm the accuracy of the linear model in three independent ways. Using the linear model, we quantify the two primary effects of error diffusion: edge sharpening and noise shaping. For each effect, we develop an objective measure of its impact on the subjective quality of the halftone. Edge sharpening is proportional to the linear gain, and we give a formula to estimate the gain from a given error filter. In quantifying the noise, we modify the input image to compensate for the sharpening distortion and apply a perceptually weighted signal-to-noise ratio to the residual of the halftone and modified input image. We compute the correlation between the residual and the original image to show when the residual can be considered signal independent. We also compute a tonality measure similar to total harmonic distortion. We use the proposed measures for edge sharpening, noise shaping, and tonality to evaluate the quality of error diffusion algorithms.

117 citations

Proceedings ArticleDOI
04 Apr 2022
TL;DR: This paper introduces an efficient and scalable attention model, which consists of two aspects: blocked local and dilated global attention, and expresses strong generative modeling capability on ImageNet, demonstrating the superior potential of MaxViT blocks as a universal vision module.
Abstract: Transformers have recently gained significant attention in the computer vision community. However, the lack of scalability of self-attention mechanisms with respect to image size has limited their wide adoption in state-of-the-art vision backbones. In this paper we introduce an efficient and scalable attention model we call multi-axis attention, which consists of two aspects: blocked local and dilated global attention. These design choices allow global-local spatial interactions on arbitrary input resolutions with only linear complexity. We also present a new architectural element by effectively blending our proposed attention model with convolutions, and accordingly propose a simple hierarchical vision backbone, dubbed MaxViT, by simply repeating the basic building block over multiple stages. Notably, MaxViT is able to ''see'' globally throughout the entire network, even in earlier, high-resolution stages. We demonstrate the effectiveness of our model on a broad spectrum of vision tasks. On image classification, MaxViT achieves state-of-the-art performance under various settings: without extra data, MaxViT attains 86.5% ImageNet-1K top-1 accuracy; with ImageNet-21K pre-training, our model achieves 88.7% top-1 accuracy. For downstream tasks, MaxViT as a backbone delivers favorable performance on object detection as well as visual aesthetic assessment. We also show that our proposed model expresses strong generative modeling capability on ImageNet, demonstrating the superior potential of MaxViT blocks as a universal vision module. The source code and trained models will be available at https://github.com/google-research/maxvit.

114 citations

Journal ArticleDOI
TL;DR: This work conducts a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and objective V QA model design.
Abstract: Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms. Accordingly, there is a great need for accurate video quality assessment (VQA) models for UGC/consumer videos to monitor, control, and optimize this vast content. Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of UGC content are unpredictable, complicated, and often commingled. Here we contribute to advancing the UGC-VQA problem by conducting a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and VQA model design. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models to create a new fusion-based BVQA model, which we dub the \textbf{VID}eo quality \textbf{EVAL}uator (VIDEVAL), that effectively balances the trade-off between VQA performance and efficiency. Our experimental results show that VIDEVAL achieves state-of-the-art performance at considerably lower computational cost than other leading models. Our study protocol also defines a reliable benchmark for the UGC-VQA problem, which we believe will facilitate further research on deep learning-based VQA modeling, as well as perceptually-optimized efficient UGC video processing, transcoding, and streaming. To promote reproducible research and public evaluation, an implementation of VIDEVAL has been made available online: \url{this https URL}.

113 citations

Proceedings ArticleDOI
07 May 2001
TL;DR: A method for DCT-domain blind measurement of blocking artifacts by constituting a new block across any two adjacent blocks, the blocking artifact is modeled as a 2-D step function.
Abstract: A method for DCT-domain blind measurement of blocking artifacts is proposed. By constituting a new block across any two adjacent blocks, the blocking artifact is modeled as a 2-D step function. A fast DCT-domain algorithm has been derived to constitute the new block and extract all parameters needed. Then an human visual system (HVS) based measurement of blocking artifacts is conducted. Experimental results have shown the effectiveness and stability of our method. The proposed technique can be used for online image/video quality monitoring and control in applications of DCT-domain image/video processing.

112 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations