scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2005"


Journal ArticleDOI
TL;DR: This paper proposes a novel information fidelity criterion that is based on natural scene statistics and derives a novel QA algorithm that provides clear advantages over the traditional approaches and outperforms current methods in testing.
Abstract: Measurement of visual quality is of fundamental importance to numerous image and video processing applications. The goal of quality assessment (QA) research is to design algorithms that can automatically assess the quality of images or videos in a perceptually consistent manner. Traditionally, image QA algorithms interpret image quality as fidelity or similarity with a "reference" or "perfect" image in some perceptual space. Such "full-reference" QA methods attempt to achieve consistency in quality prediction by modeling salient physiological and psychovisual features of the human visual system (HVS), or by arbitrary signal fidelity criteria. In this paper, we approach the problem of image QA by proposing a novel information fidelity criterion that is based on natural scene statistics. QA systems are invariably involved with judging the visual quality of images and videos that are meant for "human consumption". Researchers have developed sophisticated models to capture the statistics of natural signals, that is, pictures and videos of the visual environment. Using these statistical models in an information-theoretic setting, we derive a novel QA algorithm that provides clear advantages over the traditional approaches. In particular, it is parameterless and outperforms current methods in our testing. We validate the performance of our algorithm with an extensive subjective study involving 779 images. We also show that, although our approach distinctly departs from traditional HVS-based methods, it is functionally similar to them under certain conditions, yet it outperforms them due to improved modeling. The code and the data from the subjective study are available at [1].

1,334 citations


Journal ArticleDOI
TL;DR: It is claimed that natural scenes contain nonlinear dependencies that are disturbed by the compression process, and that this disturbance can be quantified and related to human perceptions of quality.
Abstract: Measurement of image or video quality is crucial for many image-processing algorithms, such as acquisition, compression, restoration, enhancement, and reproduction. Traditionally, image quality assessment (QA) algorithms interpret image quality as similarity with a "reference" or "perfect" image. The obvious limitation of this approach is that the reference image or video may not be available to the QA algorithm. The field of blind, or no-reference, QA, in which image quality is predicted without the reference image or video, has been largely unexplored, with algorithms focussing mostly on measuring the blocking artifacts. Emerging image and video compression technologies can avoid the dreaded blocking artifact by using various mechanisms, but they introduce other types of distortions, specifically blurring and ringing. In this paper, we propose to use natural scene statistics (NSS) to blindly measure the quality of images compressed by JPEG2000 (or any other wavelet based) image coder. We claim that natural scenes contain nonlinear dependencies that are disturbed by the compression process, and that this disturbance can be quantified and related to human perceptions of quality. We train and test our algorithm with data from human subjects, and show that reasonably comprehensive NSS models can help us in making blind, but accurate, predictions of quality. Our algorithm performs close to the limit imposed on useful prediction by the variability between human subjects.

612 citations




Journal ArticleDOI
TL;DR: It is shown that the proposed multispectral joint segmentation-classification method outperforms past grayscale segmentation methods when decomposing touching chromosomes and can also be used as a reliable indicator of errors in segmentation, errors in classification, and chromosome anomalies.
Abstract: Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore combinatorial labeling technique (M-FISH) was developed wherein each class of chromosomes binds with a different combination of fluorophores. This results in a multispectral image, where each class of chromosomes has distinct spectral components. In this paper, we develop new methods for automatic chromosome identification by exploiting the multispectral information in M-FISH chromosome images and by jointly performing chromosome segmentation and classification. We 1) develop a maximum-likelihood hypothesis test that uses multispectral information, together with conventional criteria, to select the best segmentation possibility; 2) use this likelihood function to combine chromosome segmentation and classification into a robust chromosome identification system; and 3) show that the proposed likelihood function can also be used as a reliable indicator of errors in segmentation, errors in classification, and chromosome anomalies, which can be indicators of radiation damage, cancer, and a wide variety of inherited diseases. We show that the proposed multispectral joint segmentation-classification method outperforms past grayscale segmentation methods when decomposing touching chromosomes. We also show that it outperforms past M-FISH classification techniques that do not use segmentation information.

89 citations


Journal ArticleDOI
TL;DR: A fully automatic chromosome classification algorithm for Multiplex Fluorescence In Situ Hybridization (M-FISH) images using supervised parametric and non-parametric techniques is described.

63 citations


Proceedings ArticleDOI
29 Apr 2005
TL;DR: A key aspect of this work is that each parameter of the filter has been incorporated to capture the variation in physical characteristics of spiculated masses and architectural distortions and that the parameters of the stage-one detection algorithm are determined by the physical measurements.
Abstract: Mass detection algorithms generally consist of two stages. The aim of the first stage is to detect all potential masses. In the second stage, the aim is to reduce the false-positives by classifying the detected objects as masses or normal tissue. In this paper, we present a new evidence based, stage-one algorithm for the detection of spiculated masses and architectural distortions. By evidence based, we mean that we use the statistics of the physical characteristics of these abnormalities to determine the parameters of the detection algorithm. Our stage-one algorithm consists of two steps, an enhancement step followed by a filtering step. In the first step, we propose a new technique for the enhancement of spiculations in which a linear filter is applied to the Radon transform of the image. In the second step, we filter the enhanced images with a new class of linear image filters called Radial Spiculation Filters. We have invented these filters specifically for detecting spiculated masses and architectural distortions that are marked by converging lines or spiculations. These filters are highly specific narrowband filters, which are designed to match the expected structures of these abnormalities and form a new class of wavelet-type filterbanks derived from optimal theories of filtering. A key aspect of this work is that each parameter of the filter has been incorporated to capture the variation in physical characteristics of spiculated masses and architectural distortions and that the parameters of the stage-one detection algorithm are determined by the physical measurements.

61 citations


Journal ArticleDOI
15 Jul 2005-Spine
TL;DR: In this paper, a measurement technique to assess dynamic motion of the lumbar spine using enhanced digital fluoroscopic video (DFV) and a distortion compensated roentgen analysis (DCRA) was developed.
Abstract: Study Design. Methodological reliability. Objective. Develop a measurement technique to assess dynamic motion of the lumbar spine using enhanced digital fluoroscopic video (DFV) and a distortion compensated roentgen analysis (DCRA). Summary of Background Data. Controversy over both the definition and consequences of lumbar segmental instability persists. Information from static imaging has had limited success in providing an understanding of this disorder. DFV has the potential to provide further information about lumbar segmental instability; however, the image quality is poor and clinical application is limited. Methods. DFV from 20 male subjects (11 with and nine without low back pain) were obtained during eccentric lumbar flexion (30 Hz). Each DFVs was enhanced with a series of filters to accentuate the vertebral edges. An adapted DCRA algorithm was applied to determine segmental angular and linear displacement. Both intraimage and interimage reliability were assessed using intraclass correlation coefficients (ICC) and standard error of the measurement (SEM). Results. Intraimage reliability yielded an average ICC of 0.986, and the SEM ranged from 0.4‐0.7° and 0.2‐0.3 mm. Interimage reliability yielded an average ICC of 0.878, and the SEM ranged from 0.7‐1.4° and 0.4‐0.7 mm. Conclusions. Enhanced DFV combined with a DCRA resulted in reliable assessment of lumbar spine kinematics. The error values associated with this technique were low and were comparable to published error measurements obtained when using a similar algorithm on handdrawn outlines from static radiographs.

61 citations


Journal ArticleDOI
TL;DR: An entropy minimization algorithm is derived and it is found that it performs optimally at reducing total contrast uncertainty and that it also works well at reducing the mean squared error between the original image and the image reconstructed from the multiple fixations.
Abstract: The human visual system combines a wide field of view with a high-resolution fovea and uses eye, head, and body movements to direct the fovea to potentially relevant locations in the visual scene. This strategy is sensible for a visual system with limited neural resources. However, for this strategy to be effective, the visual system needs sophisticated central mechanisms that efficiently exploit the varying spatial resolution of the retina. To gain insight into some of the design requirements of these central mechanisms, we have analyzed the effects of variable spatial resolution on local contrast in 300 calibrated natural images. Specifically, for each retinal eccentricity (which produces a certain effective level of blur), and for each value of local contrast observed at that eccentricity, we measured the probability distribution of the local contrast in the unblurred image. These conditional probability distributions can be regarded as posterior probability distributions for the “true” unblurred contrast, given an observed contrast at a given eccentricity. We find that these conditional probability distributions are adequately described by a few simple formulas. To explore how these statistics might be exploited by central perceptual mechanisms, we consider the task of selecting successive fixation points, where the goal on each fixation is to maximize total contrast information gained about the image (i.e., minimize total contrast uncertainty). We derive an entropy minimization algorithm and find that it performs optimally at reducing total contrast uncertainty and that it also works well at reducing the mean squared error between the original image and the image reconstructed from the multiple fixations. Our results show that measurements of local contrast alone could efficiently drive the scan paths of the eye when the goal is to gain as much information about the spatial structure of a scene as possible.

53 citations




Proceedings ArticleDOI
01 Jan 2005
TL;DR: A multiple description image coding scheme in the wavelet domain using quantized frame expansions is proposed and a tight frame operator is applied to the zerotrees to evaluate the performance of the scheme over an erasure channel.
Abstract: Multiple description codes generated by quantized frame expansions have been shown to perform well on erasure channels when compared to traditional channel codes. In this paper we propose a multiple description image coding scheme in the wavelet domain using quantized frame expansions. We form zerotrees from wavelet coefficients and apply a tight frame operator to the zerotrees. We then group appropriate expansions to form packets and evaluate the performance of the scheme over an erasure channel. We compare the performance of the proposed scheme with a conventional channel coding scheme.


Proceedings ArticleDOI
29 Apr 2005
TL;DR: A new algorithm and preliminary results for classifying lesions into BI-RADS shape categories: round, oval, lobulated, or irregular are presented, which could potentially be used in conjunction with a CAD system to enable greater interaction and personalization.
Abstract: We present a new algorithm and preliminary results for classifying lesions into BI-RADS shape categories: round, oval, lobulated, or irregular. By classifying masses into one of these categories, computer aided detection (CAD) systems will be able to provide additional information to radiologists. Thus, such a tool could potentially be used in conjunction with a CAD system to enable greater interaction and personalization. For this classification task, we have developed a new set of features using the Beamlet transform, which is a recently developed multi-scale image analysis transform. We trained a k-Nearest Neighbor classifier using images from the Digital Database for Digital Mammography (DDSM). The method was tested on a set of 25 images of each type and we obtained a classification accuracy of 78% for classifying masses as oval or round and an accuracy of 72% for classifying masses as lobulated or round.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This work presents an unequal power allocation scheme for transmission of JPEG compressed images over multiple-input multiple-output systems employing spatial multiplexing, and shows that this scheme provides significant image quality improvement as compared to different equal power schemes.
Abstract: With the introduction of multiple transmit and re- ceiveantennas innext generationwireless systems, real-time image and video communication are expected to become quite common, since very high data rates will become available along with im- proveddatareliability.Newjointtransmissionandcodingschemes that explore advantages of multiple antenna systems matched with source statistics are expected to be developed. Based on this idea, we present an unequal power allocation scheme for transmission of JPEG compressed images over multiple-input multiple-output systems employing spatial multiplexing. The JPEG-compressed image is divided into different quality layers, and different layers are transmitted simultaneously from different transmit antennas using unequal transmit power, with a constraint on the total transmit power during any symbol period. Results show that our unequal power allocation scheme provides significant image quality improvement as compared to different equal power allo- cations schemes, with the peak-signal-to-noise-ratio gain as high as 14 dB at low signal-to-noise-ratios.


Journal ArticleDOI
TL;DR: The proposed foveation embedded DCT domain video transcoding can reduce the bit rate without compromising visual quality or achieve better subjective quality for a given bit rate by shaping the compression distortion according to the foveated contrast sensitivity function of the HVS.

Journal ArticleDOI
TL;DR: This paper introduces a prototype foveated visual communication system suitable for implementation as a core element of an interactive multimedia wireless communication environment and demonstrates the benefit of using foveation in noisy wireless low bandwidth applications.

Proceedings ArticleDOI
14 Nov 2005
TL;DR: Novel techniques for detecting watermarks in images in a known-cover attack framework using natural scene models and indicates that this statistical framework is effective in the steganalysis of spread spectrum watermarks.
Abstract: This paper presents novel techniques for detecting watermarks in images in a known-cover attack framework using natural scene models. Specifically, we consider a class of watermarking algorithms, popularly known as spread spectrum-based techniques. We attempt to classify images as either watermarked or distorted by common signal processing operations like compression, additive noise etc. The basic idea is that the statistical distortion introduced by spread spectrum watermarking is very different from that introduced by other common distortions. Our results are very promising and indicate that this statistical framework is effective in the steganalysis of spread spectrum watermarks.

Proceedings ArticleDOI
18 Mar 2005
TL;DR: It is proposed that a memorability map of a complex natural scene may be constructed to represent the low-level memorability of local regions in a similar fashion to the familiar saliency map, which records bottom-up fixation attractors.
Abstract: Recent years have seen a resurgent interest in eye movements during natural scene viewing. Aspects of eye movements that are driven by low-level image properties are of particular interest due to their applicability to biologically motivated artificial vision and surveillance systems. In this paper, we report an experiment in which we recorded observers’ eye movements while they viewed calibrated greyscale images of natural scenes. Immediately after viewing each image, observers were shown a test patch and asked to indicate if they thought it was part of the image they had just seen. The test patch was either randomly selected from a different image from the same database or, unbeknownst to the observer, selected from either the first or last location fixated on the image just viewed. We find that several low-level image properties differed significantly relative to the observers’ ability to successfully designate each patch. We also find that the differences between patch statistics for first and last fixations are small compared to the differences between hit and miss responses. The goal of the paper was to, in a non-cognitive natural setting, measure the image properties that facilitate visual memory, additionally observing the role that temporal location (first or last fixation) of the test patch played. We propose that a memorability map of a complex natural scene may be constructed to represent the low-level memorability of local regions in a similar fashion to the familiar saliency map, which records bottom-up fixation attractors.

Proceedings ArticleDOI
01 Dec 2005
TL;DR: It is demonstrated how a novel characterization of the contrast statistics of natural images can be used for selecting fixation points that minimize the total contrast uncertainty (entropy) ofnatural images.
Abstract: In this paper we address the problem of visual surveillance, which we define as the problem of optimally extracting information from the visual scene with a fixating, foveated imaging system. We are explicitly concerned with eye/camera movement strategies that result in maximizing information extraction from the visual field. Here we demonstrate how a novel characterization of the contrast statistics of natural images can be used for selecting fixation points that minimize the total contrast uncertainty (entropy) of natural images. We demonstrate the performance of the algorithm and compare its performance to ground truth methods. The results show that our algorithm performs favorably in terms of both efficiency and its ability to find salient features in the image.

Book ChapterDOI
01 Jan 2005

Book ChapterDOI
01 Dec 2005
TL;DR: In this article, the authors describe tools and techniques that facilitate a gentle introduction to fascinating concepts in digital image processing from four leading universities, equipped with informative visualizations and a user-friendly interfaces, which are currently being used effectively in a classroom environment for teaching DIP at many universities across the world.
Abstract: This chapter describes tools and techniques that facilitate a gentle introduction to fascinating concepts in digital image processing from four leading universities. Equipped with informative visualizations and a user-friendly interfaces, these modules are currently being used effectively in a classroom environment for teaching DIP at many universities across the world.

Journal ArticleDOI
TL;DR: Theorems that place limits on the point-wise approximation of the responses of filters, both linear shift invariant (LSI) and linear shift variant (LSV), to input signals and images that are LSV in the following sense are developed.
Abstract: We develop theorems that place limits on the point-wise approximation of the responses of filters, both linear shift invariant (LSI) and linear shift variant (LSV), to input signals and images that are LSV in the following sense: they can be expressed as the outputs of systems with LSV impulse responses, where the shift variance is with respect to the filter scale of a single-prototype filter. The approximations take the form of LSI approximations to the responses. We develop tight bounds on the approximation errors expressed in terms of filter durations and derivative (Sobolev) norms. Finally, we find application of the developed theory to defoveation of images, deblurring of shift-variant blurs, and shift-variant edge detection.

Book ChapterDOI
01 Jan 2005

Proceedings ArticleDOI
18 Mar 2005
TL;DR: An error-resilient image communications application that uses the GSM model and multiple description coding (MDC) to provide error- Resilience, and derives a rate-distortion bound for GSM random variables, derive the redundancy rate- Distortion function, and finally implement an MD image communication system.
Abstract: The statistics of natural scenes in the wavelet domain are accurately characterized by the Gaussian scale mixture (GSM) model. The model lends itself easily to analysis and many applications that use this model are emerging (e.g., denoising, watermark detection). We present an error-resilient image communications application that uses the GSM model and multiple description coding (MDC) to provide error-resilience. We derive a rate-distortion bound for GSM random variables, derive the redundancy rate-distortion function, and finally implement an MD image communication system.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This work compares approaches for generating binary control signals for variable acuity superpixel imager (VASI™) cameras, and measures five objective figures of merit after foveating a small set of test images with a variety of halftoning-inspired approaches.
Abstract: We compare approaches for generating binary control signals for variable acuity superpixel imager (VASI™) cameras. We foveate a set of images using control signals generated by various halftoning approaches and then measure their performance via figures of merit (FOMs). We find that two novel approaches provide superior FOM values but inferior bandwidth control, making them unsuitable for use with VASI™ cameras. Floyd-Steinberg error diffusion gives the best combination of FOM values and bandwidth control. Our contributions include a comparison of approaches, a lookup table method to improve bandwidth control, and two novel methods for binarizing VASI™ control signals. I. INTRODUCTION Our objective is to compare how different halftoning- inspired approaches perform at generating the binary control signals used by Variable Acuity Superpixel Imager (VASI™) foveating cameras, in the context of an automatic target acquisition and recognition (ATA/ATR) application. To accomplish this, we measure five objective figures of merit (FOMs) after foveating a small set of test images with a variety of halftoning-inspired approaches. VASI™ cameras have a number of characteristics that are attractive for the ATR application, but to use them the user must specify a binary control signal that defines the camera's pixel-by-pixel behavior. The translation from a continuous- valued desired resolution signal to a binary VASI™ control signal can be based on halftoning. It must be efficient to avoid lowering the camera's effective frame rate. It must also accurately achieve the target resolution - or equivalently a target bandwidth reduction, which we express as percentage of original bandwidth (PBW) - because this drives the frame rate achieved by the VASI™ camera.