scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2008"


Journal ArticleDOI
TL;DR: A new algorithm is presented that selects image regions as likely candidates for fixation, and these regions are shown to correlate well with fixations recorded from human observers.
Abstract: The ability to automatically detect visually interesting regions in images has many practical applications, especially in the design of active machine vision and automatic visual surveillance systems. Analysis of the statistics of image features at observers' gaze can provide insights into the mechanisms of fixation selection in humans. Using a foveated analysis framework, we studied the statistics of four low-level local image features: luminance, contrast, and bandpass outputs of both luminance and contrast, and discovered that image patches around human fixations had, on average, higher values of each of these features than image patches selected at random. Contrast-bandpass showed the greatest difference between human and random fixations, followed by luminance-bandpass, RMS contrast, and luminance. Using these measurements, we present a new algorithm that selects image regions as likely candidates for fixation. These regions are shown to correlate well with fixations recorded from human observers.

175 citations


Journal ArticleDOI
TL;DR: Borders on the structural similarity (SSIM) index are derived as a function of quantization rate for fixed-rate uniform quantization of image discrete cosine transform (DCT) coefficients under the high-rate assumption.
Abstract: In this paper, we derive bounds on the structural similarity (SSIM) index as a function of quantization rate for fixed-rate uniform quantization of image discrete cosine transform (DCT) coefficients under the high-rate assumption. The space domain SSIM index is first expressed in terms of the DCT coefficients of the space domain vectors. The transform domain SSIM index is then used to derive bounds on the average SSIM index as a function of quantization rate for uniform, Gaussian, and Laplacian sources. As an illustrative example, uniform quantization of the DCT coefficients of natural images is considered. We show that the SSIM index between the reference and quantized images fall within the bounds for a large set of natural images. Further, we show using a simple example that the proposed bounds could be very useful for rate allocation problems in practical image and video coding applications.

127 citations


Journal ArticleDOI
TL;DR: It is demonstrated that designing image processing algorithms, and, in particular, denoising and restoration-type algorithms, can yield significant gains over existing algorithms by optimizing them for perceptual distortion measures, and these gains may be obtained without significant increase in the computational complexity of the algorithm.
Abstract: We propose an algorithm for designing linear equalizers that maximize the structural similarity (SSIM) index between the reference and restored signals. The SSIM index has enjoyed considerable application in the evaluation of image processing algorithms. Algorithms, however, have not been designed yet to explicitly optimize for this measure. The design of such an algorithm is nontrivial due to the nonconvex nature of the distortion measure. In this paper, we reformulate the nonconvex problem as a quasi-convex optimization problem, which admits a tractable solution. We compute the optimal solution in near closed form, with complexity of the resulting algorithm comparable to complexity of the linear minimum mean squared error (MMSE) solution, independent of the number of filter taps. To demonstrate the usefulness of the proposed algorithm, it is applied to restore images that have been blurred and corrupted with additive white gaussian noise. As a special case, we consider blur-free image denoising. In each case, its performance is compared to a locally adaptive linear MSE-optimal filter. We show that the images denoised and restored using the SSIM-optimal filter have higher SSIM index, and superior perceptual quality than those restored using the MSE-optimal adaptive linear filter. Through these results, we demonstrate that a) designing image processing algorithms, and, in particular, denoising and restoration-type algorithms, can yield significant gains over existing (in particular, linear MMSE-based) algorithms by optimizing them for perceptual distortion measures, and b) these gains may be obtained without significant increase in the computational complexity of the algorithm.

94 citations


Proceedings ArticleDOI
12 Dec 2008
TL;DR: The SSIM model is shown to be equivalent to models of contrast gain control of the HVS and the Information Fidelity Criterion is a monotonic function of the structure term of the SSIM index applied in the sub-band filtered domain.
Abstract: This paper studies two increasingly popular paradigms for image quality assessment - Structural SIMilarity (SSIM) metrics and Information Fidelity metrics The relation of the SSIM metric to Mean Squared Error and Human Visual System (HVS) based models of quality assessment are studied The SSIM model is shown to be equivalent to models of contrast gain control of the HVS We study the information theoretic metrics and show that the Information Fidelity Criterion (IFC) is a monotonic function of the structure term of the SSIM index applied in the sub-band filtered domain Our analysis of the Visual Information Fidelity (VIF) criterion shows that improvements in VIF include incorporation of a contrast comparison, in addition to the structure comparison in IFC Our analysis attempts to unify quality metrics derived from different first principles and characterize the relative performance of different QA systems

85 citations


Journal ArticleDOI
TL;DR: This study converted distances from accurate range maps of forest scenes and indoor scenes into disparities that an observer would encounter, given an eye model and fixation distances, and found that the distributions of natural disparities in these two kinds of scenes are centered at zero, have high peaks, and span about 5 deg, which closely matches the macaque MT cells' disparity tuning range.
Abstract: Binocular disparity is the input to stereopsis, which is a very strong depth cue in humans. However, the distribution of binocular disparities in natural environments has not been quantitatively measured. In this study, we converted distances from accurate range maps of forest scenes and indoor scenes into the disparities that an observer would encounter, given an eye model and fixation distances (which we measured for the forest environment, and simulated for the indoor environment). We found that the distributions of natural disparities in these two kinds of scenes are centered at zero, have high peaks, and span about 5 deg, which closely matches the macaque MT cells’ disparity tuning range. These ranges are fully within the operational range of human stereopsis determined psychophysically. Suprathreshold disparities (910 arcsec) are common rather than exceptional. There is a prevailing notion that stereopsis only operates within a few meters, but our finding suggests that we should rethink the role of stereopsis at far viewing distances because of the abundance of suprathreshold disparities.

82 citations


Journal ArticleDOI
TL;DR: The authors have invented a new class of linear filters, spiculated lesion filters, for the detection of converging lines or spiculations, and invented a novel technique to enhance spicules on mammograms.
Abstract: The detection of lesions on mammography is a repetitive and fatiguing task. Thus, computer-aided detection systems have been developed to aid radiologists. The detection accuracy of current systems is much higher for clusters of microcalcifications than for spiculated masses. In this article, the authors present a new model-based framework for the detection of spiculated masses. The authors have invented a new class of linear filters, spiculated lesion filters, for the detection of converging lines or spiculations. These filters are highly specific narrowband filters, which are designed to match the expected structures of spiculated masses. As a part of this algorithm, the authors have also invented a novel technique to enhance spicules on mammograms. This entails filtering in the radon domain. They have also developed models to reduce the false positives due to normal linear structures. A key contribution of this work is that the parameters of the detection algorithm are based on measurements of physical properties of spiculated masses. The results of the detection algorithm are presented in the form of free-response receiver operating characteristic curves on images from the Mammographic Image Analysis Society and Digital Database for Screening Mammography databases.

58 citations


Proceedings ArticleDOI
08 Dec 2008
TL;DR: A new framework for personal identity verification using 3-D geometry of the face is introduced and a comparison is made between the two alternative curve-based facial surface representations.
Abstract: In this paper a new framework for personal identity verification using 3-D geometry of the face is introduced Initially, 3-D facial surfaces are represented by curves extracted from facial surfaces (facial curves) Two alternative facial curves are examined in this research: iso-depth and iso-geodesic curves Iso-depth curves are produced by intersecting a facial surface with parallel planes perpendicular to the direction of gaze, at different depths from the nose tip An Iso-geodesic curve is defined to be the locus of all points on the facial surface having the same geodesic distance from a given facial landmark (eg the nose tip) Once the facial curves are extracted, their characteristics are encoded by several features like the shape descriptors or polar Euclidean distances from the origin (nose tip) The final step is to verify or disapprove requests from users claiming the identity of registered individuals (gallery members) by comparing their features using Euclidean distance classifier or support vector machine (SVM) The performance results of the identity verification experiments are reported and a comparison is made between the two alternative curve-based facial surface representations

38 citations


01 Jan 2008
TL;DR: A new quality metric for range images that is based on the multi-scale Structural Similarity (MS-SSIM) Index that operates in a manner to SSIM but allows for special handling of missing data.
Abstract: We propose a new quality metric for range images that is based on the multi-scale Structural Similarity (MS-SSIM) Index. The new metric operates in a manner to SSIM but allows for special handling of missing data. We demonstrate its utility by reevaluating the set of stereo algorithms evaluated in the Middlebury Stereo Vision Page http://vision.middlebury.edu/stereo/. The new algorithm which we term Range SSIM (R-SSIM) Index possesses features that make it an attractive choice for assessing the quality of range images.

31 citations


Proceedings ArticleDOI
12 May 2008
TL;DR: It is shown using these examples that optimizing equalizers for the SSIM index does indeed result in higher perceptual image quality compared to equalizers optimized for the ubiquitous mean squared error (MSE).
Abstract: In this paper, we present an algorithm for designing a linear equalizer that is optimal with respect to the structural similarity (SSIM) index. The optimization problem is shown to be non-convex, thereby making it non-trivial. The non-convex problem is first converted to a quasi-convex problem and then solved using a combination of first order necessary conditions and bisection search. To demonstrate the usefulness of this solution, it is applied to image denoising and image restoration examples. We show using these examples that optimizing equalizers for the SSIM index does indeed result in higher perceptual image quality compared to equalizers optimized for the ubiquitous mean squared error (MSE).

30 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: A superior verification accuracy was obtained using the range data, and a highly competitive accuracy to that of other techniques in the literature was also obtained for the portrait data.
Abstract: In this paper, we present a novel identity verification system based on Gabor features extracted from range (3D) representations of faces Multiple landmarks (fiducials) on a face are automatically detected using these Gabor features Once the landmarks are identified, the Gabor features on all fiducials of a face are concatenated to form a feature vector for that particular face Linear discriminant analysis (LDA) is used to reduce the dimensionality of the feature vector while maximizing the discrimination power These novel features were tested on 1196 range images The same features were also extracted from portrait images, and the accuracies of both modalities were compared A superior verification accuracy was obtained using the range data, and a highly competitive accuracy to that of other techniques in the literature was also obtained for the portrait data

29 citations


Proceedings ArticleDOI
24 Mar 2008
TL;DR: In this article, the appearance of each feature point is encoded using a set of Gabor wavelet responses extracted at multiple orientations and spatial frequencies, which are computed at each pixel in the search window on a fiducial.
Abstract: We propose a novel technique to detect feature points from portrait and range representations of the face. In this technique, the appearance of each feature point is encoded using a set of Gabor wavelet responses extracted at multiple orientations and spatial frequencies. A vector of Gabor coefficients, called a jet, is computed at each pixel in the search window on a fiducial and compared with a set of jets, called a bunch, collected from a set of training data on the same type of fiducial. The desired feature point is located at the pixel whose jet is the most similar to the training bunch. This is the first time that Gabor wavelet responses were used to detect facial landmarks from range images. This method was tested on 1146 pairs of range and portrait images and high detection accuracies are achieved using a small number of training images. It is shown that co-localization using Gabor jets on range and portrait images resulted in better accuracy than using any single image modality. The obtained accuracies are competitive to that of other techniques in the literature.

Journal ArticleDOI
TL;DR: A new feature normalization method is introduced for M-FISH images that reduces the difference in the feature distributions among different images using the expectation maximization (EM) algorithm and is as accurate as the maximum-likelihood classifier, whose accuracy also significantly improved after the EM normalization.
Abstract: Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that allows simultaneous analysis of numerical and structural abnormalities of whole human chromosomes. Chromosomes are stained combinatorially in M-FISH. By analyzing the intensity combinations of each pixel, all chromosome pixels in an image are classified. Often, the intensity distributions between different images are found to be considerably different and the difference becomes the source of misclassifications of the pixels. Improved pixel classification accuracy is the most important task to ensure the success of the M-FISH technique. In this paper, we introduce a new feature normalization method for M-FISH images that reduces the difference in the feature distributions among different images using the expectation maximization (EM) algorithm. We also introduce a new unsupervised, nonparametric classification method for M-FISH images. The performance of the classifier is as accurate as the maximum-likelihood classifier, whose accuracy also significantly improved after the EM normalization. We would expect that any classifier will likely produce an improved classification accuracy following the EM normalization. Since the developed classification method does not require training data, it is highly convenient when ground truth does not exist. A significant improvement was achieved on the pixel classification accuracy after the new feature normalization. Indeed, the overall pixel classification accuracy improved by 20% after EM normalization.

Journal ArticleDOI
TL;DR: An algorithm for enhancement of spicules ofSpiculated masses, which uses the discrete radon transform, is developed and it is found that most observers preferred the enhanced images generated with the fast slant stack (FSS) method.
Abstract: We have developed an algorithm for enhancement of spicules of spiculated masses, which uses the discrete radon transform. Previously, we employed a commonly used method to compute the discrete radon transform, which we refer to as the DRT. Recently, a new, more exact method to compute the discrete radon transform was developed by Averbuch et al, which is called the fast slant stack (FSS) method. Our hypothesis was that this new formulation would help to improve our enhancement algorithm. To test this idea, we conducted multiple two-alternative-forced-choice observer studies and found that most observers preferred the enhanced images generated with the FSS method.

Proceedings ArticleDOI
25 Mar 2008
TL;DR: Borders on the structural similarity (SSIM) index are derived as a function of quantization rate for fixed-rate uniform quantization of image discrete cosine transform (DCT) coefficients under the high rate assumption.
Abstract: In this paper, we derive bounds on the structural similarity (SSIM) index as a function of quantization rate for fixed-rate uniform quantization of image discrete cosine transform (DCT) coefficients under the high rate assumption. The space domain SSIM index is first expressed in terms of the DCT coefficients of the space domain vectors. The transform domain SSIM Index is then used to derive bounds on the average SSIM index as a function of quantization rate for Gaussian and Laplacian sources. As an illustrative example, uniform quantization of the DCT coefficients of natural images is considered. We show that the SSIM index between the reference and quantized images fall within the bounds for a large set of natural images. Further, we show using a simple example that the proposed bounds could be very useful for rate allocation problems in practical image and video coding applications.

Journal ArticleDOI
TL;DR: The classical independent component analysis (ICA) decomposition is refined using a multilinear expansion of the probability density function of the source statistics and a specific nonlinear system is introduced that allows for elegantly capture the statistical dependences between the responses of the multil inear ICA (MICA) filters.
Abstract: We refine the classical independent component analysis (ICA) decomposition using a multilinear expansion of the probability density function of the source statistics. In particular, we introduce a specific nonlinear system that allows us to elegantly capture the statistical dependences between the responses of the multilinear ICA (MICA) filters. The resulting multilinear probability density is analytically tractable and does not require Monte Carlo simulations to estimate the model parameters. We demonstrate the MICA model on natural image textures and envision that the new model will prove useful for analyzing nonstationarity natural images using natural scene statistics models.

Proceedings ArticleDOI
12 Dec 2008
TL;DR: The visual quality of the images denoised using the proposed algorithm is shown to be higher compared to the MSE-optimal soft thresholding denoising solution, as measured by the SSIM Index.
Abstract: In this paper, we present a novel algorithm for wavelet domain image denoising using the soft thresholding function. The thresholds are designed to be locally optimal with respect to the structural similarity (SSIM) index. The SSIM Index is first expressed in terms of wavelet transform coefficients of orthogonal wavelet transforms. The wavelet domain representation of the SSIM Index, along with the assumption of a Gaussian prior for the wavelet coefficients is used to formulate the soft thresholding optimization problem. A locally optimal solution is found using a quasi-Newton approach. This solution is applied to denoise images in the wavelet domain. The visual quality of the images denoised using the proposed algorithm is shown to be higher compared to the MSE-optimal soft thresholding denoising solution, as measured by the SSIM Index.

Journal ArticleDOI
23 May 2008
TL;DR: It is argued that computer-aided detection will become an increasingly important tool for radiologists in the early detection of breast cancer, but there are some important issues that need to be given greater focus in designing CAD systems if they are to reach their full potential.
Abstract: The use of computer-aided detection (CAD) systems in mammography has been the subject of intense research for many years. These systems have been developed with the aim of helping radiologists to detect signs of breast cancer. However, the effectiveness of CAD systems in practice has sparked recent debate. In this commentary, we argue that computer-aided detection will become an increasingly important tool for radiologists in the early detection of breast cancer, but there are some important issues that need to be given greater focus in designing CAD systems if they are to reach their full potential.

Journal ArticleDOI
TL;DR: The amenable properties of Gaussian white noise images are analytically quantified to better understand current methodologies for detecting regions of phase instability and introduce a new, more effective means for identifying these regions based on the second derivative of phase.
Abstract: Exploiting the quasi-linear relationship between local phase and disparity, phase-differencing registration algorithms provide a fast, powerful means for disparity estimation. Unfortunately, these phase-differencing techniques suffer a significant impediment: phase nonlinearities. In regions of phase nonlinearity, the signals under consideration possess properties that invalidate the use of phase for disparity estimation. This paper uses the amenable properties of Gaussian white noise images to analytically quantify these properties. The improved understanding gained from this analysis enables us to better understand current methodologies for detecting regions of phase instability. Most importantly, we introduce a new, more effective means for identifying these regions based on the second derivative of phase.

Proceedings ArticleDOI
24 Mar 2008
TL;DR: A gray scale object recognition system that is based on foveated corner finding and that uses elements of Lowe's SIFT algorithm that is tested on a set of tool and airplane images and shown to perform well.
Abstract: We present a gray scale object recognition system that is based on foveated corner finding and that uses elements of Lowe's SIFT algorithm. The principles behind the algorithm are the use of high-information gray-scale corners as features, and an efficient corner- finding strategy to find them. The system is tested on a set of tool and airplane images and shown to perform well.

Proceedings ArticleDOI
12 Dec 2008
TL;DR: An optimum texture- based fixation selection algorithm based on a recent theory of non-stationarity measurement in natural images is developed and a simple coupling of the optimal texture-based and contrast-based fixation features is proposed to produce a new algorithm called CONTEXT, which exhibits robust performance for fixation selection innatural images.
Abstract: We present information-theoretic underpinnings of a computation theory of low-level visual fixations in natural images. In continuation of our prior work on optimal contrast-based fixations [1], we develop an optimum texture- based fixation selection algorithm based on a recent theory of non-stationarity measurement in natural images [2]. Thereafter we propose a simple coupling of the optimal texture-based and contrast-based fixation features to produce a new algorithm called CONTEXT, which exhibits robust performance for fixation selection in natural images. The performance of the fixation algorithms are evaluated for natural images by comparison to randomized fixation strategies via actual human fixations performed on the images. The fixation patterns obtained outperform randomized, GAFFE-based [3], and Itti [4] fixation strategies in terms of matching human fixation patterns. These results also demonstrate the important role that contrast and textural information play in low-level visual processes in the Human Visual System (HVS).

Proceedings ArticleDOI
01 Jan 2008
TL;DR: A novel multi-scale framework for video quality assessment that models motion in video sequences and is capable of capturing spatiotemporal artifacts in digital video is presented.
Abstract: With the rapid proliferation of digital video applications, the question of video quality control becomes central. We present a novel multi-scale framework for video quality assessment that models motion in video sequences and is capable of capturing spatiotemporal artifacts in digital video. Performance evaluation of the proposed metric on the VQEG database shows that the system is competitive with and even performs better than existing methods.

Book ChapterDOI
01 Jan 2008
TL;DR: The wavelet series expansion is analogous to the Fourier series, in that both methods represent continuous-time signals with a series of discrete coefficients.
Abstract: Linear system theory plays an important role in wavelet theory. A signal or function can often be better described, analyzed, or compressed if it is transformed into another domain using a linear transform such as the Fourier transform or a wavelet transform. Linear transformations of discrete signals can be expressed in linear algebraic forms, where the signals are considered as vectors and the transformations as matrix–vector multiplications. The wavelet series expansion is analogous to the Fourier series, in that both methods represent continuous-time signals with a series of discrete coefficients. A set of basis functions is formed by scaling and translating the basic wavelet, but the scaling and translation take only discrete values.

Proceedings Article
06 Nov 2008
TL;DR: Preliminary results on comparing different algorithms for cell segmentation and image denoising are presented, thereby increasing cell analysis throughput.
Abstract: Automated analysis of fluorescence microscopy images of endothelial cells labeled for actin is important for quantifying changes in the actin cytoskeleton. The current manual approach is laborious and inefficient. The goal of our work is to develop automated image analysis methods, thereby increasing cell analysis throughput. In this study, we present preliminary results on comparing different algorithms for cell segmentation and image denoising.




Proceedings ArticleDOI
24 Mar 2008
TL;DR: In this paper, a hybrid approach combining strategies of both phase differencing and local correlation is proposed to improve the performance of phase-differencing while incorporating multiscale aspects of local correlation.
Abstract: Exploiting the quasi-linear relationship between local phase and disparity, phase-differencing registration algorithms provide a fast, powerful means for disparity estimation. Unfortunately, phase-differencing techniques suffer from a significant impediment: the neglect of multi- scale information. In this work, we introduce a novel registration algorithm that combines strategies of both phase- differencing and local correlation. This hybrid approach retains the advantageous properties of phase-differencing while incorporating the multiscale aspects of local correlation.

Dissertation
01 Jan 2008
TL;DR: A gray scale object recognition system based on foveated corner finding, the computation of sequential fixation points, and elements of Lowe's SIFT transform that achieves rotational, transformational, and limited scale invariant object recognition that produces recognition decisions using data extracted from sequential fixation Points.
Abstract: Here we describe a gray scale object recognition system based on foveated corner finding, the computation of sequential fixation points, and elements of Lowe's SIFT transform. The system achieves rotational, transformational, and limited scale invariant object recognition that produces recognition decisions using data extracted from sequential fixation points. It is broken into two logical steps. The first is to develop principles of foveated visual search and automated fixation selection to accomplish corner search. The result is a new algorithm for finding corners which is also a corner-based algorithm for aiming computed foveated visual fixations. In the algorithm, long saccades move the fovea to previously unexplored areas of the image, while short saccades improve the accuracy of putative corner locations. The system is tested on two natural scenes. As an interesting comparison study we compare fixations generated by the algorithm with those of subjects viewing the same images, whose eye movements are being recorded by an eyetracker. The comparison of fixation patterns is made using an information-theoretic measure. Results show that the algorithm is a good locator of corners, but does not correlate particularly well with human visual fixations. The second step is to use the corners located, which meet certain goodness criteria, as keypoints in a modified version of the SIFT algorithm. Two scales are implemented. This implementation creates a database of SIFT features of known objects. To recognize an unknown object, a corner is located and a feature vector created. The feature vector is compared with those in the database of known objects. The process is continued for each corner in the unknown object until enough information has been accumulated to reach a decision. The system was tested on 78 gray scale objects, hand tools and airplanes, and shown to perform well.


Proceedings ArticleDOI
12 May 2008
TL;DR: A method for computing dense stereo correspondences in calibrated monocular video by iteratively and stochastically sampling match quality values in the disparity search space by perturbing a correspondence estimate with random noise and formulating an influence at each sample based on the perturbation and its effect on correspondence match quality.
Abstract: We present a method for computing dense stereo correspondences in calibrated monocular video by iteratively and stochastically sampling match quality values in the disparity search space. Most existing methods exhaustively compute local correspondence quality before searching for a globally optimal solution. Instead, we iteratively refine a correspondence estimate by perturbing it with random noise and formulating an influence at each sample based on the perturbation and its effect on correspondence match quality. Local influence is aggregated to recover consistent trends in match quality caused by the piecewise-continuous structure of the scene. Correspondence estimates for a given frame pair are seeded with the estimates from the previous frame pair, allowing convergence to occur across multiple frame pairs.