scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2007"


Proceedings ArticleDOI
15 Apr 2007
TL;DR: A novel quality metric for video sequences is proposed that utilizes motion information in video sequences, which is the main difference in moving from images to video.
Abstract: Quality assessment plays a very important role in almost all aspects of multimedia signal processing such as acquisition, coding, display, processing etc. Several objective quality metrics have been proposed for images, but video quality assessment has received relatively little attention and most video quality metrics have been simple extension of metrics for images. In this paper, we propose a novel quality metric for video sequences that utilizes motion information in video sequences, which is the main difference in moving from images to video. This metric is capable of capturing temporal artifacts in video sequences in addition to spatial distortions. Results are presented that demonstrate the efficacy of our quality metric by comparing model performance against subjective scores on the database developed by the video quality experts group.

109 citations


Journal ArticleDOI
TL;DR: This work develops a mathematical framework for quantifying and understanding multidimensional frequency modulations in digital images and derives the ordinary differential equations (ODEs) that describe image flowlines.
Abstract: We develop a mathematical framework for quantifying and understanding multidimensional frequency modulations in digital images. We begin with the widely accepted definition of the instantaneous frequency vector (IF) as the gradient of the phase and define the instantaneous frequency gradient tensor (IFGT) as the tensor of component derivatives of the IF vector. Frequency modulation bounds are derived and interpreted in terms of the eigendecomposition of the IFGT. Using the IFGT, we derive the ordinary differential equations (ODEs) that describe image flowlines. We study the diagonalization of the ODEs of multidimensional frequency modulation on the IFGT eigenvector coordinate system and suggest that separable transforms can be computed along these coordinates. We illustrate these new methods of image pattern analysis on textured and fingerprint images. We envision that this work will find value in applications involving the analysis of image textures that are nonstationary yet exhibit local regularity. Examples of such textures abound in nature

71 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: It is shown that incorporating domain specific knowledge about the structural diversity of human faces significantly improves the performance of 3D human face recognition algorithms.
Abstract: We present a systematic procedure for selecting facial fiducial points associated with diverse structural characteristics of a human face. We identify such characteristics from the existing literature on anthropometric facial proportions. We also present three dimensional (3D) face recognition algorithms, which employ Euclidean/geodesic distances between these anthropometric fiducial points as features along with linear discriminant analysis classifiers. Furthermore, we show that in our algorithms, when anthropometric distances are replaced by distances between arbitrary regularly spaced facial points, their performances decrease substantially. This demonstrates that incorporating domain specific knowledge about the structural diversity of human faces significantly improves the performance of 3D human face recognition algorithms.

57 citations


Journal ArticleDOI
TL;DR: Foveation produced an increased difference between human and random patch ensembles for contrast and its higher-order statistics, and an eccentricity-based analysis showed that shorter saccades were more likely to land on patches with higher values of these features.

39 citations


Journal ArticleDOI
TL;DR: A new algorithm for finding corners is developed, which is also a corner-based algorithm for aiming computed foveated visual fixations, which does not correlate particularly well with human visual fixation.
Abstract: We cast the problem of corner detection as a corner search process. We develop principles of foveated visual search and automated fixation selection to accomplish the corner search, supplying a case study of both foveated search and foveated feature detection. The result is a new algorithm for finding corners, which is also a corner-based algorithm for aiming computed foveated visual fixations. In the algorithm, long saccades move the fovea to previously unexplored areas of the image, while short saccades improve the accuracy of putative corner locations. The system is tested on two natural scenes. As an interesting comparison study, we compare fixations generated by the algorithm with those of subjects viewing the same images, whose eye movements are being recorded by an eye tracker. The comparison of fixation patterns is made using an information-theoretic measure. Results show that the algorithm is a good locater of corners, but does not correlate particularly well with human visual fixations

26 citations


Proceedings ArticleDOI
TL;DR: A novel 3D face recognition algorithm that employs geodesic and Euclidean distances between facial fiducial points and 'global curvature' characteristics that is robust to changes in facial expression is proposed and investigated.
Abstract: We propose a novel method to improve the performance of existing three dimensional (3D) human face recognition algorithms that employ Euclidean distances between facial fiducial points as features. We further investigate a novel 3D face recognition algorithm that employs geodesic and Euclidean distances between facial fiducial points. We demonstrate that this algorithm is robust to changes in facial expression. Geodesic and Euclidean distances were calculated between pairs of 25 facial fiducial points. For the proposed algorithm, geodesic distances and 'global curvature' characteristics, defined as the ratio of geodesic to Euclidean distance between a pairs of points, were employed as features. The most discriminatory features were selected using stepwise linear discriminant analysis (LDA). These were projected onto 11 LDA directions, and face models were matched using the Euclidean distance metric. With a gallery set containing one image each of 105 subjects and a probe set containing 663 images of the same subjects, the algorithm produced EER=1.4% and a rank 1 RR=98.64%. It performed significantly better than existing algorithms based on principal component analysis and LDA applied to face range images. Its verification performance for expressive faces was also significantly better than an algorithm that employed Euclidean distances between facial fiducial points as features.

23 citations


Journal ArticleDOI
TL;DR: A novel variant of the classification image paradigm that allows us to rapidly reveal strategies used by observers in visual search tasks is proposed and a new classification taxonomy is introduced that distinguishes between foveal and peripheral processes.
Abstract: We propose a novel variant of the classification image paradigm that allows us to rapidly reveal strategies used by observers in visual search tasks. We make use of eye tracking, 1/f noise, and a grid-like stimulus ensemble and also introduce a new classification taxonomy that distinguishes between foveal and peripheral processes. We tested our method for 3 human observers and two simple shapes used as search targets. The classification images obtained show the efficacy of the proposed method by revealing the features used by the observers in as few as 200 trials. Using two control experiments, we evaluated the use of naturalistic 1/f noise with classification images, in comparison with the more commonly used white noise, and compared the performance of our technique with that of an earlier approach without a stimulus grid.

21 citations


Proceedings ArticleDOI
TL;DR: This Keynote Address paper describes the recent successful advances on QA algorithms for still images, specifically, the Structural SIMilarity (SSIM) Index and the Visual Information Fidelity (VIF) Index, and efforts towards extending these Image Quality Assessment frameworks to the much more complex problem of Video Quality Assessment.
Abstract: In this Keynote Address paper, we review early work on Image and Video Quality Assessment against the backdrop of an interpretation of image perception as a visual communication problem. As a way of explaining our recent work on Video Quality Assessment, we first describe our recent successful advances on QA algorithms for still images, specifically, the Structural SIMilarity (SSIM) Index and the Visual Information Fidelity (VIF) Index. We then describe our efforts towards extending these Image Quality Assessment frameworks to the much more complex problem of Video Quality Assessment. We also discuss our current efforts towards the design and construction of a generic and publicly-available Video Quality Assessment database.

15 citations


Journal ArticleDOI
TL;DR: This paper uses a novel technique akin to psychophysical reverse correlation and stimuli that emulate the natural visual environment to measure observers' ability to locate a low-contrast target of unknown orientation and provides strong evidence for saccadic selectivity for spatial frequencies close to the target's central frequency.
Abstract: The human visual system is remarkably adept at finding objects of interest in cluttered visual environments, a task termed visual search. Because the human eye is highly foveated, it accomplishes this by making many discrete fixations linked by rapid eye movements called saccades. In such naturalistic tasks, we know very little about how the brain selects saccadic targets (the fixation loci). In this paper, we use a novel technique akin to psychophysical reverse correlation and stimuli that emulate the natural visual environment to measure observers' ability to locate a low-contrast target of unknown orientation. We present three main discoveries. First, we provide strong evidence for saccadic selectivity for spatial frequencies close to the target's central frequency. Second, we demonstrate that observers have distinct, idiosyncratic biases to certain orientations in saccadic programming, although there were no priors imposed on the target's orientation. These orientation biases cover a subset of the near-cardinal (horizontal/vertical) and near-oblique orientations, with orientations near vertical being the most common across observers. Further, these idiosyncratic biases were stable across time. Third, within observers, very similar biases exist for foveal target detection accuracy. These results suggest that saccadic targeting is tuned for known stimulus dimensions (here, spatial frequency) and also has some preference or default tuning for uncertain stimulus dimensions (here, orientation).

14 citations


Proceedings ArticleDOI
12 Nov 2007
TL;DR: A novel technique to extract features from 3D face representations by first the nose tip is automatically located on the range image, then the range data from a hexagonal region of interest around this landmark is decomposed using Barycentric wavelet kernels.
Abstract: Interest in face recognition systems has increased significantly due to the emergence of significant commercial opportunities in surveillance and security applications. In this paper we propose a novel technique to extract features from 3D face representations. In this technique, first the nose tip is automatically located on the range image, then the range data from a hexagonal region of interest around this landmark is decomposed using Barycentric wavelet kernels. The dimensionality of the extracted coefficients at each resolution level is reduced using principal component analysis (PCA). These new features are tested on 206 range images, and a high classification accuracy is achieved using a small number of features. The obtained accuracy is competitive to that of other techniques in literature.

13 citations


Proceedings ArticleDOI
12 Nov 2007
TL;DR: This paper defines the corresponding epipolar space in the other image as the union of all associated epipolar lines over all possible system geometries, eliminating the need for calibration at the cost of an increased search region.
Abstract: Depth recovery for active binocular vision systems is simplified if the camera geometry is known and corresponding points can be restricted to epipolar lines. Unfortunately, computation of epipolar lines requires calibration which can be complex and inaccurate. While it is possible to register images without geometric information, such unconstrained algorithms are usually time consuming and prone to error. In this paper we propose a compromise. Even without the instantaneous knowledge of the system geometry, we can restrict the region of correspondence by imposing limits on the possible range of configurations, and as a result, confine our search for matching points to epipolar spaces. For each point in one image, we define the corresponding epipolar space in the other image as the union of all associated epipolar lines over all possible system geometries. Epipolar spaces eliminate the need for calibration at the cost of an increased search region.

Proceedings ArticleDOI
12 Nov 2007
TL;DR: The proposed non-stationarity index is conceptually simple and is intertwined with the probabilistic structure of the image segment being analyzed, which means it will find useful applications in computer vision algorithms.
Abstract: We present a novel approach for non-stationarity detection in natural images by exploiting the prior knowledge of the independent component structure of scene statistics. Our proposed non-stationarity index is conceptually simple and is intertwined with the probabilistic structure of the image segment being analyzed. It shows consistently good results when applied to natural scenes and, we expect, will find useful applications in computer vision algorithms in as much as the detection of statistically non-stationary locations in images can be an important preliminary step toward the understanding of scene content and in the guiding of visual fixations.

Proceedings ArticleDOI
12 Nov 2007
TL;DR: A foundation of theorems is derived that provide a means for obtaining optimal sampling schemes for a given set of epipolar spaces and is defined as a strategy that minimizes the average area per epipolar space.
Abstract: If precise calibration information is unavailable, as is often the case for active binocular vision systems, the determination of epipolar lines becomes untenable. Yet, even without instantaneous knowledge of the geometry, the search for corresponding points can be restricted to areas called epipolar spaces. For each point in one image, we define the corresponding epipolar space in the other image as the union of all associated epipolar lines over all possible system geometries. Epipolar spaces eliminate the need for calibration at the cost of an increased search region. One approach to mitigate this increase is the application of a space variant sampling or foveation strategy. While the application of such strategies to stereo vision tasks is not new, only rarely has a foveation scheme been specifically tailored for a stereo vision task. In this paper we derive a foundation of theorems that provide a means for obtaining optimal sampling schemes for a given set of epipolar spaces. An optimal sampling scheme is defined as a strategy that minimizes the average area per epipolar space.

Dissertation
01 Jan 2007
TL;DR: It is shown that under uncertainty, observers rely on known target characteristics to direct their saccades and to select target candidates upon foveal scrutiny, and multiple orientation characteristics of targets are represented in observer search strategies, modulated by their sensitivity/selectivity for each orientation.
Abstract: Visual search can simply be defined as the task of looking for an object of interest in a visual environment. Due to its foveated nature, the human visual system succeeds at such task by making many discrete fixations linked by rapid eye movements called saccades. However, very little is known about how saccadic targets (fixation loci) are selected by the brain in such naturalistic tasks. Discoveries to be made are not only invaluable to the field of vision science but are very important in designing automated vision systems, which to this day lag in performance vis-a-vis human observers. What I have sought to accomplish in this dissertation has been to reveal previously unknown saccadic targeting and target selection strategies used by human observers in naturalistic visual search tasks. My driving goal has been to understand how the brain selects fixation loci and target candidates upon fixation, with the objective of using these findings for automated fixation selection algorithms employed for visual search. I have proposed a novel and efficient technique akin to psychophysical reverse correlation to study human observer strategies in locating low-contrast targets under a variety of experimental conditions. My technique has successfully been used to study saccadic programming and target selection in various experimental conditions, including visual searches for targets with known characteristics, targets whose orientation attributes are not known a priori, and targets containing multiple orientations. I have found visual guidance in saccadic targeting and target selection under all experimental conditions, revealed by observers' selectivity for spatial frequencies and/or orientations of stimuli close to that of the target. I have shown that under uncertainty, observers rely on known target characteristics to direct their saccades and to select target candidates upon foveal scrutiny. Moreover, I have demonstrated that multiple orientation characteristics of targets are represented in observer search strategies, modulated by their sensitivity/selectivity for each orientation. Some of my findings have been applied towards applications for automated visual search algorithms.

Proceedings ArticleDOI
01 Dec 2007
TL;DR: Perceptual Image Processing is taking an increasingly important role in the field of multimedia processing, and the receiver in this context is the marvelous human eye-cortex system, while the transmitter is the environment, which casts images of extraordinary variability onto camera and retinal sensors.
Abstract: Perceptual Image Processing is taking an increasingly important role in the field of multimedia processing. Designing algorithms to accord with visual perception is a natural idea, but has met with limited success owing to our imperfect knowledge of the intended receiver, and indeed, of the transmitter. The receiver in this context, of course, is the marvelous human eye-cortex system, while the transmitter is the environment, which casts images of extraordinary variability onto camera and retinal sensors.