scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Book ChapterDOI
19 Dec 2017
TL;DR: The principle hypothesis of structural similarity based image quality assessment is that the HVS is highly adapted to extract structural information from the visual field, and therefore a measurement ofStructural similarity (or distortion) should provide a good approximation to perceived image quality.
Abstract: This chapter presents structural similarity as an alternative design philosophy for objective image quality assessment methods. It discusses the motivation, the general idea, and a specific structural similarity (SSIM) index algorithm of the structural similarity-based image quality assessment method. Many image quality assessment algorithms have been shown to behave consistently when applied to distorted images created from the same original image, using the same type of distortions. The SSIM indexing algorithm is quite encouraging not only because it achieves good quality prediction accuracy in the current tests, but also because of its simple formulation and low complexity implementation. The principal hypothesis of structural similarity based image quality assessment is that the human visual system is highly adapted to extract structural information from the visual field, and therefore a measurement of structural similarity should provide a good approximation to perceived image quality.

70 citations

01 Jan 1996
TL;DR: A new theory of multidimensional AM-FM signal modeling is presented, and foundations are developed for analyzing modulations in arbitrarily dimensioned continuous and discrete signals using nonlinear demodulation operators related to the Teager-Kaiser operator.
Abstract: A new theory of multidimensional AM-FM signal modeling is presented in this dissertation. AM-FM image functions generalize the 2D Fourier transform kernel by admitting arbitrarily varying amplitude and phase modulations. Thus, they are inherently capable of capturing essential nonstationary, yet locally coherent image structures. Often, such nonstationarities contribute significantly to visual perception and interpretation. Recently, AM-FM signal modeling has been successfully applied to a number of problems characterized by such nonstationarities. Examples include speech recognition and analysis, image texture modeling, analysis, and segmentation, 3D shape from texture, and phase-based computational stereopsis. The first half of the dissertation focuses on the fundamental motivation and principles of AM-FM modeling. Foundations are developed for analyzing modulations in arbitrarily dimensioned continuous and discrete signals using nonlinear demodulation operators related to the Teager-Kaiser operator. In the second half of the dissertation, practical approaches are presented for extracting AM-FM sub-image information from digital images and for computing multi-component AM-FM image representations. Biologically motivated multiband Gabor filter banks are used for isolating AM-FM image multi-components on a spatio-spectrally localized basis, and optimal filters are designed for tracking multi-components across the filterbank channel responses using a statistical state-space component model. Techniques for recovering the essential structure of an image from its computed AM-FM representation are also developed. Two main computational paradigms are presented in detail. The first, called dominant component analysis, delivers estimates of the emergent frequencies and dominant amplitude modulations of an image. These estimates are useful in a variety of image processing and machine vision tasks, including shape from texture and texture-based stereopsis. Texture segmentation using the estimated dominant component modulating functions is also demonstrated. In the second main paradigm, full multi-component AM-FM image representations are computed. Exciting future applications of such representations include AM-FM-based image and video coding for multimedia communications and CD-ROM mass storage systems.

70 citations

Proceedings ArticleDOI
02 Nov 2020
TL;DR: A large-scale comparative evaluation is conducted to assess the capabilities and limitations of multiple temporal pooling strategies on blind VQA of usergenerated videos and proposes an ensemble pooling model built on top of high-performing temporal Pooling models.
Abstract: Many objective video quality assessment (VQA) algorithms include a key step of temporal pooling of frame-level quality scores. However, less attention has been paid to studying the relative efficiencies of different pooling methods on noreference (blind) VQA. Here we conduct a large-scale comparative evaluation to assess the capabilities and limitations of multiple temporal pooling strategies on blind VQA of usergenerated videos. The study yields insights and general guidance regarding the application and selection of temporal pooling models. In addition, we also propose an ensemble pooling model built on top of high-performing temporal pooling models. Our experimental results demonstrate the relative efficacies of the evaluated temporal pooling models, using several popular VQA algorithms evaluated on two recent largescale natural video quality databases. Conclusively, we also provide an empirical recipe for applying temporal pooling of frame-based quality predictions.

69 citations

Patent
17 Jun 1991
TL;DR: An improved method for coding and decoding still or moving visual pattern images by partitioning images into blocks or cubes, respectively, and coding each image separately according to visually significant responses of the human eye is achieved by calculating and subtracting a mean intensity value from digital numbers within each block or cube and detecting visually perceived edge locations within the resultant residual sub-image as mentioned in this paper.
Abstract: An improved method for coding and decoding still or moving visual pattern images by partitioning images into blocks or cubes, respectively, and coding each image separately according to visually significant responses of the human eye Coding is achieved by calculating and subtracting a mean intensity value from digital numbers within each block or cube and detecting visually perceivable edge locations within the resultant residual sub-image If a visually perceivable edge is contained within the block or cube, gradient magnitude and orientation at opposing sides of the edge within each edge block or cube are calculated and appropriately coded If no perceivable edge is contained within the block or cube, the sub-image is coded as a uniform intensity block Decoding requires receiving coded mean intensity value, gradient magnitude and pattern code, and then decoding a combination of these three indicia to be arranged in an orientation substantially similar to the original digital image or original sequence of digital images Coding and decoding can be accomplished in a hierarchical pattern Further, hierarchical processing can be programmably manipulated according to user-defined criteria

68 citations

Journal ArticleDOI
16 May 1998
TL;DR: Foveated vergent active stereo vision system actively directs a pair of vergent stereo cameras to fixate on surfaces in a scene, performing multiresolution surface depth recovery at each fixation point, and accumulating and integrating aMultiresolution map of surface depth over multiple successive fixations.
Abstract: We introduce FOVEA: a foveated vergent active stereo vision system. FOVEA actively directs a pair of vergent stereo cameras to fixate on surfaces in a scene, performing multiresolution surface depth recovery at each fixation point, and accumulating and integrating a multiresolution map of surface depth over multiple successive fixations. Several features of the system are novel: an active foveated image sampling and processing strategy is shown to greatly simplify the problem of establishing correspondence; a probabilistic fixation strategy is developed that is driven by the scene structure; the system uses the fixation strategy to recover local depth maps at a high resolution at multiple fixation points, eventually mapping the entire scene; and finally, the local maps are integrated as they are acquired into a global depth map.

67 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations