scispace - formally typeset
Search or ask a question
Author

Alan C. Bovik

Bio: Alan C. Bovik is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Image quality & Video quality. The author has an hindex of 102, co-authored 837 publications receiving 96088 citations. Previous affiliations of Alan C. Bovik include University of Illinois at Urbana–Champaign & University of Sydney.


Papers
More filters
Proceedings ArticleDOI
01 Feb 1991
TL;DR: In this article, the use of chromatic photometric constraints for solving the dense stereo correspondence problem has been investigated and a theoretical construction for developing dense stereo correspondences which use chromatic information has been proposed.
Abstract: We investigate the use of chromatic information in dense stereo correspondence. Specifically the chromatic photometric constraint which is used to specify a mathematical optimality criterion for solving the dense stereo correspondence problem is developed. The result is a theoretical construction for developing dense stereo correspondence algorithms which use chromatic information. The efficacy of using chromatic information via this construction is tested by implementing singleand multi-resolution versions of a stereo correspondence algorithm which uses simulated annealing as a means of solving the optimization problem. Results demonstrate that the use of chromatic information can significantly improve the performance of dense stereo correspondence. 1.© (1991) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

1 citations

01 Jan 1992
TL;DR: This work provides support for the hypothesis that the computation of LSF information is a general-purpose stage in low-level vision by finding LSF-based solutions to three different vision problems.
Abstract: The use of local spatial frequency (LSF) methods for analysis of texture images is explored. Analyses and solutions of three specific problems are presented: (1) the measurement of the fractal dimension of images locally and accurately; (2) the measurement of the three-dimensional orientation of planar textured surfaces; (3) the computation of the three-dimensional shape of curved textured surfaces. The solutions presented apply to surfaces with both rough and reflectance textures. All solutions have been implemented and tested on images of real surfaces, including ones with complex, irregular textures. The novelty of the work is in the use of LSF representations as the basis for the required computations. An LSF representation of any texture can be computed easily with low-level, parallelizable, decision-free computations. This is an advantage over traditional texture-element and edge-element based methods which require pre-processing that is more complex, higher-level, not decision-free, and not applicable to all textures. The methods presented can use different LSF representations; however, Gabor wavelet decompositions are favored for their sampling efficiency and optimal joint localization in the spatial and spatial-frequency domains, which results in highly localized, accurate measurements. By finding LSF-based solutions to three different vision problems, this work provides support for the hypothesis that the computation of LSF information is a general-purpose stage in low-level vision. This work develops novel texture projection models that describe the relationship of image LSF's, surface LSF's, and surface geometry; these models are potentially applicable to other vision problems as well.

1 citations

BookDOI
01 Jan 2002
TL;DR: This website will be so easy for you to access the internet service, so you can really keep in mind that the book is the best book for you.
Abstract: We present here because it will be so easy for you to access the internet service. As in this new era, much technology is sophistically offered by connecting to the internet. No any problems to face, just for this day, you can really keep in mind that the book is the best book for you. We offer the best here to read. After deciding how your feeling will be, you can enjoy to visit the link and get the book.

1 citations

01 Jan 2015
TL;DR: It is found that this particular I/VQA model is not apt for evaluating collections with varied content, and that their implementation at large scale can narrow the problem of curating very digital video collections and lead to preservation and access decisions based on informed priorities.
Abstract: As the production, the variety, and the consumption of borndigital video grows, so does the demand for acquiring, curating and preserving large-scale digital video collections. A multidisciplinary team of curators, computer scientists and video engineers we explore the use of Non-Reference Image and Video Quality Algorithms (I/VQA), specifically of BRISQUE in this paper, to automatically derive ranges of video quality. An important characteristic of these algorithms is that they are modeled to human perception. We run the algorithms in a High Performance Computing (HPC) environment to obtain results for many videos at the same time, accelerating time to results and precision in computing per-frame and per-video quality assessment scores. Results, which were evaluated quantitatively and qualitatively, suggest that BRISQUE identifies the distortions in which it was trained, and performs well in videos that have natural scenes and do not have drastic scene changes. While we found that this particular model is not apt for evaluating collections with varied content, the results suggest that research into other I/VQA models is promising, and that their implementation at large scale can narrow the problem of curating very digital video collections and lead to preservation and access decisions based on informed priorities. Introduction The use of video has become significant and pervasive in our daily lives, going beyond traditional education and entertainment functions into areas such as personal communications exchange, criminal evidence, surveillance, and marketing. With this functional diversity comes a variety of formats, including advancing compression, and editing mechanisms to facilitate video creation and distribution. The advancements in video technology are important to cultural institutions, responsible for documenting society and of preserving video collections. Over time, these video collections grow without bound, severely encumbering the curation task. Accordingly, collecting institutions realize that individual and manual inspection, a traditional approach to assessing video quality and making subsequent preservation and access decisions, is an insurmountable task. Instead, novel, reliable, and automated methods are required for this purpose. Motivated by the need to develop curation solutions for large and varied video collections, this project investigates the use of Image and Video Quality Assessment (I/VQA) algorithms to generate data-driven, perceptually relevant indicators of video quality levels for large video collections. I/VQA algorithms are designed to predict the subjective quality of a natural image or video that has been digitally acquired, processed, communicated and displayed as would be perceived and reported by users [1]. Currently, such algorithms are used to assess the quality of images and videos in streaming applications, and to dynamically correct their distortions. In this project we explore if and which I/VQA algorithms can be used to conduct large-scale automated assessment from which the need for more in depth video analysis can be prioritized. We conducted experiments to understand the adequacy/scope and to refine the I/VQA algorithm BRISQUE using a reference set of videos and a set of artistic videos as testbeds. All the experiments were run using High Performance Computing Resources (HPC). Running parallel computational processes on HPC systems allows generating results for individual frames per video in a collection, promptly and accurately within one workflow. Interpreting these results entailed a qualitative evaluation this is viewing videos with frame-level quality predictions along with a graph indicating a holistic measure of quality over an entire video. In the context of a digital curation project, experimenting with these algorithms in an HPC environment benefits from an interdisciplinary approach. A collaboration between the Laboratory for Image and Video Engineering (LIVE http://live.ece.utexas.edu), which conducts research in I/VQA, and the Texas Advanced Computing Center (TACC http://www.tacc.utexas.edu), which deploys computational resources for open science research, our team combines the expertise of data curators and computational scientists, with that of video engineers. In this paper we will introduce the I/VQA algorithms, explain how they compare to current methods to estimate video quality in heritage video collections, show the experiments conducted to understand the fitness of the model for video collections’ assessment, and discuss the results obtained from testing the model in reference video sets and in a regular video collection. I/VQA Algorithms State-of-the-art I/VQA algorithms are based on natural scene statistics (NSS), which function under the premise that scenes have statistical regularities. Because the human visual system is tuned to note regularities from irregularities, the statistics sensitive to these variations in regularity have been shown to correlate well with difference mean opinion scores (DMOS) of images and video. To successfully map these statistics to a single perceptual quality score, these algorithms train on both images and videos that have corresponding opinion scores. These DMOS scores are computed from a set of subjective evaluations obtained from humans watching sets of videos that have specific types and degrees of distortions. These videos are rated using a continuous sliding scale with the labels “Worst,” “Poor,” “Fair,” “Good,” and “Excellent.” 124 © 2015 Society for Imaging Science and Technology The user scores are combined to compute the DMOS score on the range of [0-100], where 0 is “Excellent” and 100 is “Worst.” These human scores are necessary for measuring the impact that different distortions have on perceptual quality [1]. I/VQA algorithms can be full-reference (FR) and no-reference (NR). The former require as input a high quality reference image or video against which a distorted copy can be compared to. In the context of curation, a FR algorithm, the Structural Similarity Index (SSIM), was used to verify if and to what degree the conversion of original video files involved information loss [2]. By contrast, NR algorithms measure the perceived quality in images and videos for which there is no original or pristine version available for comparison [1]. We propose that NR algorithms could be useful to understand a collection’s quality without the need for humans to review each video. But, studies have to be conducted to understand which models can be used to assess quality in video collections that are varied in content and distortions. The focus of this paper is evaluating if BRISQUE, a NR algorithm for image quality assessment that can be used to assess video, is appropriate for digital video curation. Related Work Collecting institutions have been traditionally focused on digitizing analogue video for preservation and access, and a number of video QC tools have been introduced for purposes of automatic and objective quality assessment of digitized files [3, 4]. This is a great improvement over the traditional approach in which humans reviewed the files to detect both errors originating in the analogue media that was digitized and errors resulting from the digitization process. Indeed, while humans can identify different types of video distortions, manually recording them with precision is extremely time consuming and inconsistent [5]. Aside from individual differences, popular QC tools identify various types of artifacts and noise in individual frames and across frame differences, producing frame-by-frame features [3] or averaged features [4] for each type of detected distortion. In turn, these results have to be interpreted to derive a holistic quality condition per video. Therefore, while these tools assist the curation task by a human, none of them eliminate the need for humans to view the videos. To accurately assess the condition of a video in a perceptually relevant context, these features must be mapped to a quality score which correlates significantly with human-based DMOS scores. Our work differs in methods and scope from the above, serving a complementary function. As opposed to detecting errors based on distortion-specific filters and corresponding ranges of normalcy, we are introducing perceptual subjective measures based on models of the human visual system to understand the quality of individual digital videos within collections. Importantly, the scores produced by the I/VQA algorithms are statistically significant through their correlation with the consensus scores obtained from people that have rated the distortions in reference video sets. Such consensus can be understood as the collective interpretation of quality. In addition, our project does not focus on detecting analogue distortions or on evaluating the results of the digitization process, but on distortions that are typical of compression algorithms. Because we are interested in processing large video collections, we run the model on a supercomputer allowing us to obtain DMOS predictions both holistically and at the per-frame scale. In addition, we also performed a study without training on rated distortions to remove subjectivity. In the following section we describe the testbed collections used to build and to evaluate our model, and the studies performed to determine its fitness to assess large-scale video collections conditions.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations