scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Proceedings ArticleDOI
TL;DR: 3-SSIM (or 3-MS- SSIM) provide results consistent with human subjectivity when finding the quality of blurred and noisy images, and also deliver better performance than SSIM (and MS-SS IM) on five types of distorted images from the LIVE Image Quality Assessment Database.
Abstract: The assessment of image quality is very important for numerous image processing applications, where the goal of image quality assessment (IQA) algorithms is to automatically assess the quality of images in a manner that is consistent with human visual judgment. Two prominent examples, the Structural Similarity Image Metric (SSIM) and Multi-scale Structural Similarity (MS-SSIM) operate under the assumption that human visual perception is highly adapted for extracting structural information from a scene. Results in large human studies have shown that these quality indices perform very well relative to other methods. However, the performance of SSIM and other IQA algorithms are less effective when used to rate amongst blurred and noisy images. We address this defect by considering a three-component image model, leading to the development of modified versions of SSIM and MS-SSIM, which we call three component SSIM (3-SSIM) and three component MS-SSIM (3-MS-SSIM). A three-component image model was proposed by Ran and Farvardin, [13] wherein an image was decomposed into edges, textures and smooth regions. Different image regions have different importance for vision perception, thus, we apply different weights to the SSIM scores according to the region where it is calculated. Thus, four steps are executed: (1) Calculate the SSIM (or MS-SSIM) map. (2) Segment the original (reference) image into three categories of regions (edges, textures and smooth regions). Edge regions are found where a gradient magnitude estimate is large, while smooth regions are determined where the gradient magnitude estimate is small. Textured regions are taken to fall between these two thresholds. (3) Apply non-uniform weights to the SSIM (or MS-SSIM) values over the three regions. The weight for edge regions was fixed at 0.5, for textured regions it was fixed at 0.25, and at 0.25 for smooth regions. (4) Pool the weighted SSIM (or MS-SSIM) values, typically by taking their weighted average, thus defining a single quality index for the image (3-SSIM or 3-MS-SSIM). Our experimental results show that 3-SSIM (or 3-MS-SSIM) provide results consistent with human subjectivity when finding the quality of blurred and noisy images, and also deliver better performance than SSIM (and MS-SSIM) on five types of distorted images from the LIVE Image Quality Assessment Database.

99 citations

Journal ArticleDOI
TL;DR: The results show that the algorithm has high correlation with human judgments when assessing blur distortion of images, and considers natural scenes statistics models combined with multi-resolution decomposition methods to extract reliable features for QA.
Abstract: The increasing number of demanding consumer video applications, as exemplified by cell phone and other low-cost digital cameras, has boosted interest in no-reference objective image and video quality assessment (QA) algorithms. In this paper, we focus on no-reference image and video blur assessment. We consider natural scenes statistics models combined with multi-resolution decomposition methods to extract reliable features for QA. The algorithm is composed of three steps. First, a probabilistic support vector machine (SVM) is applied as a rough image quality evaluator. Then the detail image is used to refine the blur measurements. Finally, the blur information is pooled to predict the blur quality of images. The algorithm is tested on the LIVE Image Quality Database and the Real Blur Image Database; the results show that the algorithm has high correlation with human judgments when assessing blur distortion of images.

98 citations

Journal ArticleDOI
TL;DR: A Hammerstein-Wiener model is presented for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos and it is shown that the model is able to reliably predict the TVSQ of rate adaptive videos.
Abstract: Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable flexible rate-adaptation under varying channel conditions. Accurately predicting the users' quality of experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos. Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online TVSQ prediction in HTTP-based streaming.

97 citations

Journal ArticleDOI
TL;DR: This paper has constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions, and demonstrates the value of the new resource, which is called the live video quality challenge database (LIVE-VQC), by conducting a comparison with leading NR video quality predictors on it.
Abstract: The great variations of videographic skills, camera designs, compression and processing protocols, and displays lead to an enormous variety of video impairments. Current no-reference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, often commingled distortions that are impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Towards advancing NR video quality prediction, we constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding more than 205000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the LIVE Video Quality Challenge Database (LIVE-VQC), by conducting a comparison of leading NR video quality predictors on it. This study is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: this http URL .

97 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations