scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2004"


Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: This work proposes an information fidelity criterion that quantifies the Shannon information that is shared between the reference and distorted images relative to the information contained in the reference image itself, and demonstrates the performance of the algorithm by testing it on a data set of 779 images.
Abstract: Measurement of image quality is crucial for many image-processing algorithms. Traditionally, image quality assessment algorithms predict visual quality by comparing a distorted image against a reference image, typically by modeling the human visual system (HVS), or by using arbitrary signal fidelity criteria. We adopt a new paradigm for image quality assessment. We propose an information fidelity criterion that quantifies the Shannon information that is shared between the reference and distorted images relative to the information contained in the reference image itself. We use natural scene statistics (NSS) modeling in concert with an image degradation model and an HVS model. We demonstrate the performance of our algorithm by testing it on a data set of 779 images, and show that our method is competitive with state of the art quality assessment methods, and outperforms them in our simulations.

1,349 citations


Journal ArticleDOI
TL;DR: A new philosophy in designing image and video quality metrics is followed, which uses structural dis- tortion as an estimate of perceived visual distortion as part of full-reference (FR) video quality assessment.
Abstract: Objective image and video quality measures play important roles in a variety of image and video pro- cessing applications, such as compression, communication, printing, analysis, registration, restoration, enhancement and watermarking. Most proposed quality assessment ap- proaches in the literature are error sensitivity-based meth- ods. In this paper, we follow a new philosophy in designing image and video quality metrics, which uses structural dis- tortion as an estimate of perceived visual distortion. A com- putationally ecient approach is developed for full-reference (FR) video quality assessment. The algorithm is tested on the video quality experts group (VQEG) Phase I FR-TV test data set. Keywords—Image quality assessment, video quality assess- ment, human visual system, error sensitivity, structural dis- tortion, video quality experts group (VQEG)

1,083 citations


01 Jan 2004
TL;DR: A Structural Similarity Index is developed and its promise is demonstrated through a set of intuitive ex- amples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual im- age quality traditionally attempt to quantify the visibility of errors (dierences) between a distorted image and a ref- erence image using a variety of known properties of the hu- man visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative com- plementary framework for quality assessment based on the degradation of structural information. As a specific exam- ple of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive ex- amples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab imple- mentation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.

1,081 citations


Dissertation
01 Jan 2004
TL;DR: It is shown that quality assessment algorithms deal only with images and videos that are meant for human consumption, and that they outperform current state-of-the-art methods in my simulations.
Abstract: Measurement of image quality is crucial for designing image processing systems that could potentially degrade visual quality. Such measurements allow developers to optimize designs to deliver maximum quality while minimizing system cost. This dissertation is about automatic algorithms for quality assessment of digital images. Traditionally, researchers have equated image quality with image fidelity, or the closeness of a distorted image to a ‘reference’ image that is assumed to have perfect quality. This closeness is typically measured by modeling the human visual system, or by using different mathematical criteria for signal similarity. In this dissertation, I approach the problem from a novel direction. I claim that quality assessment algorithms deal only with images and videos that are meant for human consumption, and that these signals are almost exclusively images and videos of the visual environment. Image distortions make these so-called natural scenes look ‘unnatural’. I claim that this departure from ‘expected’ characteristics could be quantified for predicting visual quality. I present a novel information-theoretic approach to image quality assessment using statistical models for natural scenes. I approach the quality assessment problem as an information fidelity problem, in which the distortion process is viewed as a channel that limits the flow of information from a source of natural images to the receiver (the brain). I show that quality of a test image is strongly related to the amount of statistical information about the reference image that is present in the test image. I also explore image quality assessment in the absence of the reference, and present a novel method for blindly quantifying the quality of images compressed by wavelet based compression algorithms. I show that images are rendered unnatural by the quantization process during lossy compression, and that this unnaturalness could be quantified blindly for predicting visual quality. I test and validate the performance of the algorithms proposed in this dissertation through an extensive study in which ground truth data was obtained from many human subjects. I show that the methods presented can accurately predict visual quality, and that they outperform current state-of-the-art methods in my simulations.

146 citations


Proceedings ArticleDOI
07 Jun 2004
TL;DR: This work discovered CI templates that indeed resembled the target by analyzing the stimulus at the point of gaze using the classification image (CI) paradigm, and demonstrated that these CI templates are useful in predicting stimulus regions that draw human fixations in search tasks.
Abstract: Seemingly complex tasks like visual search can be analyzed using a cognition-free, bottom-up framework. We sought to reveal strategies used by observers in visual search tasks using accurate eye tracking and image analysis at point of gaze. Observers were instructed to search for simple geometric targets embedded in 1=f noise. By analyzing the stimulus at the point of gaze using the classification image (CI) paradigm, we discovered CI templates that indeed resembled the target. No such structure emerged for a random-searcher. We demonstrate, qualitatively and quantitatively, that these CI templates are useful in predicting stimulus regions that draw human fixations in search tasks. Filtering a 1=f noise stimulus with a CI results in a ‘fixation prediction map’. A qualitative evaluation of the prediction was obtained by overlaying k-means clusters of observers’ fixations on the prediction map. The fixations clustered around the local maxima in the prediction map. To obtain a quantitative comparison, we computed the Kullback-Leibler distance between the recorded fixations and the prediction. Using random-searcher CIs in Monte Carlo simulations, a distribution of this distance was obtained. The z-scores for the human CIs and the original target were -9.70 and -9.37 respectively indicating that even in noisy stimuli, observers deploy their fixations eciently to likely targets rather than casting them randomly hoping to fortuitously find the target.

52 citations


01 Jan 2004
TL;DR: Foveated image and video coding systems achieve increased compression efficiency by removing considerable high-frequency information redundancy from the regions away from the fixation point without significant loss of the reconstructed image or video quality.
Abstract: The human visual system (HVS) is highly space-variant in sampling, coding, processing, and understanding of visual information. The visual sensitivity is highest at the point of fixation and decreases dramatically with distance from the point of fixation. By taking advantage of this phenomenon, foveated image and video coding systems achieve increased compression efficiency by removing considerable high-frequency information redundancy from the regions away from the fixation point without significant loss of the reconstructed image or video quality.

37 citations


Proceedings ArticleDOI
24 Oct 2004
TL;DR: This paper presents a statistical model for estimating the distortion introduced in progressive JPEG compressed images due to quantization and channel bit errors in a joint manner and presents an unequal power allocation scheme as a simple application of the model.
Abstract: The need for efficient joint source-channel coding is growing as new multimedia services are introduced in commercial wireless communication systems. An important component of practical joint source-channel coding schemes is a distortion model to measure the quality of compressed digital multimedia such as images and videos. Unfortunately, models for estimating the distortion due to quantization and channel bit errors in a combined fashion do not appear to be available for practical image or video coding standards. This paper presents a statistical model for estimating the distortion introduced in progressive JPEG compressed images due to both quantization and channel bit errors. Important compression techniques such as Huffman coding, DPCM coding, and run-length coding are included in the model. Examples show that the distortion in terms of peak signal to noise ratio can be predicted within a 2 dB maximum error.

36 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: Automatic segmentation and classification of M-FISH chromosome images are jointly performed using a six-feature, 25-class maximum-likelihood classifier and high correct classification results are obtained.
Abstract: Automatic segmentation and classification of M-FISH chromosome images are jointly performed using a six-feature, 25-class maximum-likelihood classifier. Preprocessing of the images including background correction and six-channel color compensation method are introduced. A feature transformation method, spherical coordinate transformation, is introduced. High correct classification results are obtained.

27 citations


Book ChapterDOI
01 Jan 2004
TL;DR: A novel AM-FM (amplitude- modulation, frequency-modulation) image representation is adapted to process latent fingerprint images and comparison to expert human analysis is made.
Abstract: We adapt a novel AM-FM (amplitude-modulation, frequency-modulation) image representation to process latent fingerprint images. The AM-FM representation captures, in a very natural way, the global flow of fingerprint patterns. Discontinuities in this flow are easily detectable as potential fingerprint minutiae. We demonstrate application of an AM-FM-based system to actual latent fingerprints and make comparison to expert human analysis.

1 citations