scispace - formally typeset
Search or ask a question

Showing papers by "Alan C. Bovik published in 2009"


Journal ArticleDOI
TL;DR: This article has reviewed the reasons why people want to love or leave the venerable (but perhaps hoary) MSE and reviewed emerging alternative signal fidelity measures and discussed their potential application to a wide variety of problems.
Abstract: In this article, we have reviewed the reasons why we (collectively) want to love or leave the venerable (but perhaps hoary) MSE. We have also reviewed emerging alternative signal fidelity measures and discussed their potential application to a wide variety of problems. The message we are trying to send here is not that one should abandon use of the MSE nor to blindly switch to any other particular signal fidelity measure. Rather, we hope to make the point that there are powerful, easy-to-use, and easy-to-understand alternatives that might be deployed depending on the application environment and needs. While we expect (and indeed, hope) that the MSE will continue to be widely used as a signal fidelity measure, it is our greater desire to see more advanced signal fidelity measures being used, especially in applications where perceptual criteria might be relevant. Ideally, the performance of a new signal processing algorithm might be compared to other algorithms using several fidelity criteria. Lastly, we hope that we have given further motivation to the community to consider recent advanced signal fidelity measures as design criteria for optimizing signal processing algorithms and systems. It is in this direction that we believe that the greatest benefit eventually lies.

2,601 citations


Journal ArticleDOI
TL;DR: A new measure of image similarity called the complex wavelet structural similarity (CW-SSIM) index is introduced and its applicability as a general purpose image similarity index is shown and it is demonstrated that it is computationally less expensive and robust to small rotations and translations.
Abstract: We introduce a new measure of image similarity called the complex wavelet structural similarity (CW-SSIM) index and show its applicability as a general purpose image similarity index. The key idea behind CW-SSIM is that certain image distortions lead to consistent phase changes in the local wavelet coefficients, and that a consistent phase shift of the coefficients does not change the structural content of the image. By conducting four case studies, we have demonstrated the superiority of the CW-SSIM index against other indices (e.g., Dice, Hausdorff distance) commonly used for assessing the similarity of a given pair of images. In addition, we show that the CW-SSIM index has a number of advantages. It is robust to small rotations and translations. It provides useful comparisons even without a preprocessing image registration step, which is essential for other indices. Moreover, it is computationally less expensive.

568 citations


Journal ArticleDOI
TL;DR: This comprehensive and state-of-the art approach to image processing gives engineers and students a thorough introduction, and includes full coverage of key applications: image watermarking, fingerprint recognition, face recognition and iris recognition and medical imaging.
Abstract: A complete introduction to the basic and intermediate concepts of image processing from the leading people in the fieldA CD-ROM contains 70 highly interactive demonstration programs with user friendly interfaces to provide a visual presentation of the conceptsUp-to-date content, including statistical modeling of natural, anistropic diffusion, image quality and the latest developments in JPEG 2000 This comprehensive and state-of-the art approach to image processing gives engineers and students a thorough introduction, and includes full coverage of key applications: image watermarking, fingerprint recognition, face recognition and iris recognition and medical imaging To help learn the concepts and techniques, the book contains a CD-ROM of 70 highly interactive visual demonstrations Key algorithms and their implementation details are included, along with the latest developments in the standards"This book combines basic image processing techniques with some of the most advanced procedures Introductory chapters dedicated to general principles are presented alongside detailed application-orientated ones As a result it is suitably adapted for different classes of readers, ranging from Master to PhD students and beyond" - Prof Jean-Philippe Thiran, EPFL, Lausanne, Switzerland"Al Bovik's compendium proceeds systematically from fundamentals to today's research frontiers Professor Bovik, himself a highly respected leader in the field, has invited an all-star team of contributors Students, researchers, and practitioners of image processing alike should benefit from the Essential Guide" - Prof Bernd Girod, Stanford University, USA"This book is informative, easy to read with plenty of examples, and allows great flexibility in tailoring a course on image processing or analysis" - Prof Pamela Cosman, University of California, San Diego, USA * A complete and modern introduction to the basic and intermediate concepts of image processing - edited and written by the leading people in the field* An essential reference for all types of engineers working on image processing applications* A CD-ROM contains 70 highly interactive demonstration programs with user friendly interfaces to provide a visual presentation of the concepts* Up-to-date content, including statistical modelling of natural, anisotropic diffusion, image quality and the latest developments in JPEG 2000

477 citations


Journal ArticleDOI
TL;DR: Two strategies to weight image quality measurements by visual importance and quality-based weighting are described, finding that these strategies can improve the correlations with subjective judgment significantly.
Abstract: Recent image quality assessment (IQA) metrics achieve high correlation with human perception of image quality. Naturally, it is of interest to produce even better results. One promising method is to weight image quality measurements by visual importance. To this end, we describe two strategies-visual fixation-based weighting, and quality-based weighting. By contrast with some prior studies we find that these strategies can improve the correlations with subjective judgment significantly. We demonstrate improvements on the SSIM index in both its multiscale and single-scale versions, using the LIVE database as a test-bed.

350 citations


Book
10 Jun 2009
TL;DR: This comprehensive and state-of-the art approach to video processing gives engineers and students a comprehensive introduction and includes full coverage of key applications: wireless video, video networks, video indexing and retrieval and use of video in speech processing.
Abstract: This comprehensive and state-of-the art approach to video processing gives engineers and students a comprehensive introduction and includes full coverage of key applications: wireless video, video networks, video indexing and retrieval and use of video in speech processing. Containing all the essential methods in video processing alongside the latest standards, it is a complete resource for the professional engineer, researcher and graduate student.Numerous conceptual and numerical examplesAll the latest standards are thoroughly covered: MPEG-1, MPEG-2, MPEG-4, H.264 and AVCCoverage of the latest techniques in video security"Like its sister volume "The Essential Guide to Image Processing," Professor Bovik's Essential Guide to Video Processing provides a timely and comprehensive survey, with contributions from leading researchers in the area. Highly recommended for everyone with an interest in this fascinating and fast-moving field." -Prof. Bernd Girod, Stanford University, USA * Edited by a leading person in the field who created the IEEE International Conference on Image Processing, with contributions from experts in their fields.* Numerous conceptual and numerical examples*All the latest standards are thoroughly covered: MPEG-1, MPEG-2, MPEG-4, H.264 and AVC.* Coverage of the latest techniques in video security

178 citations


Proceedings ArticleDOI
TL;DR: 3-SSIM (or 3-MS- SSIM) provide results consistent with human subjectivity when finding the quality of blurred and noisy images, and also deliver better performance than SSIM (and MS-SS IM) on five types of distorted images from the LIVE Image Quality Assessment Database.
Abstract: The assessment of image quality is very important for numerous image processing applications, where the goal of image quality assessment (IQA) algorithms is to automatically assess the quality of images in a manner that is consistent with human visual judgment. Two prominent examples, the Structural Similarity Image Metric (SSIM) and Multi-scale Structural Similarity (MS-SSIM) operate under the assumption that human visual perception is highly adapted for extracting structural information from a scene. Results in large human studies have shown that these quality indices perform very well relative to other methods. However, the performance of SSIM and other IQA algorithms are less effective when used to rate amongst blurred and noisy images. We address this defect by considering a three-component image model, leading to the development of modified versions of SSIM and MS-SSIM, which we call three component SSIM (3-SSIM) and three component MS-SSIM (3-MS-SSIM). A three-component image model was proposed by Ran and Farvardin, [13] wherein an image was decomposed into edges, textures and smooth regions. Different image regions have different importance for vision perception, thus, we apply different weights to the SSIM scores according to the region where it is calculated. Thus, four steps are executed: (1) Calculate the SSIM (or MS-SSIM) map. (2) Segment the original (reference) image into three categories of regions (edges, textures and smooth regions). Edge regions are found where a gradient magnitude estimate is large, while smooth regions are determined where the gradient magnitude estimate is small. Textured regions are taken to fall between these two thresholds. (3) Apply non-uniform weights to the SSIM (or MS-SSIM) values over the three regions. The weight for edge regions was fixed at 0.5, for textured regions it was fixed at 0.25, and at 0.25 for smooth regions. (4) Pool the weighted SSIM (or MS-SSIM) values, typically by taking their weighted average, thus defining a single quality index for the image (3-SSIM or 3-MS-SSIM). Our experimental results show that 3-SSIM (or 3-MS-SSIM) provide results consistent with human subjectivity when finding the quality of blurred and noisy images, and also deliver better performance than SSIM (and MS-SSIM) on five types of distorted images from the LIVE Image Quality Assessment Database.

99 citations


Journal ArticleDOI
TL;DR: The acquisition procedure is documents, common eye movement statistics are summarised, and numerous research topics for which DOVES may be used are highlighted.
Abstract: DOVES, a database of visual eye movements, is a set of eye movements collected from 29 human observers as they viewed 101 natural calibrated images. Recorded using a high-precision dual-Purkinje eye tracker, the database consists of around 30 000 fixation points, and is believed to be the first large-scale database of eye movements to be made available to the vision research community. The database, along with MATLAB functions for its use, may be downloaded freely from http://live.ece.utexas.edu/research/doves, and used without restriction for educational and research purposes, providing that this paper is cited in any published work. This paper documents the acquisition procedure, summarises common eye movement statistics, and highlights numerous research topics for which DOVES may be used.

95 citations


Proceedings ArticleDOI
29 Jul 2009
TL;DR: This paper considers natural scenes statistics and adopt multi-resolution decomposition methods to extract reliable features for QA in no-reference image and video blur assessment and shows the algorithm has high correlation with human judgment in assessing blur distortion of images.
Abstract: The increasing number of demanding consumer video applications, as exemplified by cell phone and other low-cost digital cameras, has boosted interest in no-reference objective image and video quality assessment (QA). In this paper, we focus on no-reference image and video blur assessment. There already exist a number of no-reference blur metrics, but most are based on evaluating the widths of intensity edges, which may not reflect real image quality in many circumstances. Instead, we consider natural scenes statistics and adopt multi-resolution decomposition methods to extract reliable features for QA. First, a probabilistic support vector machine (SVM) is applied as a rough image quality evaluator; then the detail image is used to refine and form the final blur metric. The algorithm is tested on the LIVE Image Quality Database; the results show the algorithm has high correlation with human judgment in assessing blur distortion of images.

81 citations


Book ChapterDOI
TL;DR: In this article, the authors examine objective criteria for the evaluation of image quality as perceived by an average human observer and highlight the similarities, dissimilarities, and interplay between these seemingly diverse techniques.
Abstract: Publisher Summary This chapter examines objective criteria for the evaluation of image quality as perceived by an average human observer The focus is on image fidelity, ie, how close an image is to a given original or reference image This paradigm of image quality assessment (QA) is also known as full reference image QA Three classes of image QA algorithms that correlate with visual perception significantly better are discussed—human vision based metrics, Structural SIMilarity (SSIM) metrics, and information theoretic metrics Each of these techniques approaches the image QA problem from a different perspective and using different first principles In addition to these QA techniques, this chapter also highlights the similarities, dissimilarities, and interplay between these seemingly diverse techniques

49 citations


Proceedings ArticleDOI
TL;DR: The MOtion-based Video Integrity Evaluation (MOVIE) as mentioned in this paper is an objective, full reference video quality index that integrates both spatial and temporal aspects of distortion assessment, and is shown to be competitive with, and even out-perform, existing video quality assessment systems.
Abstract: There is a great deal of interest in methods to assess the perceptual quality of a video sequence in a full reference framework. Motion plays an important role in human perception of video and videos suffer from several artifacts that have to deal with inaccuracies in the representation of motion in the test video compared to the reference. However, existing algorithms to measure video quality focus primarily on capturing spatial artifacts in the video signal, and are inadequate at modeling motion perception and capturing temporal artifacts in videos. We present an objective, full reference video quality index known as the MOtion-based Video Integrity Evaluation (MOVIE) index that integrates both spatial and temporal aspects of distortion assessment. MOVIE explicitly uses motion information from the reference video and evaluates the quality of the test video along the motion trajectories of the reference video. The performance of MOVIE is evaluated using the VQEG FR-TV Phase I dataset and MOVIE is shown to be competitive with, and even out-perform, existing video quality assessment systems.

45 citations


Proceedings ArticleDOI
TL;DR: Two hypothesis that explore spatial pooling strategies for the popular SSIM metrics are explored - that humans tend to perceive 'poor' regions in an image with more severity than the 'good' ones - and hence penalize images with even a small number of ' poor' regions more heavily.
Abstract: Spatial pooling strategies used in recent Image Quality Assessment (IQA) algorithms have generally been that of simply averaging the values of the obtained scores across the image. Given that certain regions in an image are perceptually more important than others, it is not unreasonable to suspect that gains can be achieved by using an appropriate pooling strategy. In this paper, we explore two hypothesis that explore spatial pooling strategies for the popular SSIM metrics. 1, 2 The first is visual attention and gaze direction - 'where' a human looks. The second is that humans tend to perceive 'poor' regions in an image with more severity than the 'good' ones - and hence penalize images with even a small number of 'poor' regions more heavily. The improvements in correlation between the objective metrics' score and human perception is demonstrated by evaluating the performance of these pooling strategies on the LIVE database 3 of images.

Journal ArticleDOI
TL;DR: Evidence is provided for the involvement of band-pass mechanisms along feature dimensions (spatial frequency and orientation) during visual search and an unusual phenomenon is observed whereby distracters containing close-to-vertical structures are fixated in searches for nonvertically oriented targets.

Journal ArticleDOI
TL;DR: This model takes into account important aspects of video compression such as transform coding, motion compensation, and variable length coding and estimates distortion within 1.5 dB of actual simulation values in terms of peak-signal-to-noise ratio.
Abstract: Multimedia communication has become one of the main applications in commercial wireless systems. Multimedia sources, mainly consisting of digital images and videos, have high bandwidth requirements. Since bandwidth is a valuable resource, it is important that its use should be optimized for image and video communication. Therefore, interest in developing new joint source-channel coding (JSCC) methods for image and video communication is increasing. Design of any JSCC scheme requires an estimate of the distortion at different source coding rates and under different channel conditions. The common approach to obtain this estimate is via simulations or operational rate-distortion curves. These approaches, however, are computationally intensive and, hence, not feasible for real-time coding and transmission applications. A more feasible approach to estimate distortion is to develop models that predict distortion at different source coding rates and under different channel conditions. Based on this idea, we present a distortion model for estimating the distortion due to quantization and channel errors in MPEG-4 compressed video streams at different source coding rates and channel bit error rates. This model takes into account important aspects of video compression such as transform coding, motion compensation, and variable length coding. Results show that our model estimates distortion within 1.5 dB of actual simulation values in terms of peak-signal-to-noise ratio.

Proceedings ArticleDOI
19 Apr 2009
TL;DR: In this paper, a new quality metric for range images based on the multi-scale Structural Similarity (MS-SSIM) Index is proposed, which operates in a manner to SSIM but allows for special handling of missing data.
Abstract: We propose a new quality metric for range images that is based on the multi-scale Structural Similarity (MS-SSIM) Index. The new metric operates in a manner to SSIM but allows for special handling of missing data. We demonstrate its utility by reevaluating the set of stereo algorithms evaluated in the Middlebury Stereo Vision Page http://vision.middlebury.edu/stereo/. The new algorithm which we term Range SSIM (R-SSIM) Index possesses features that make it an attractive choice for assessing the quality of range images.

Journal ArticleDOI
TL;DR: An optimal solution for maximizing the expected visual entropy over an orthogonal frequency division multiplexing (OFDM)-based broadband network from the perspective of cross-layer optimization is explored.
Abstract: To achieve seamless multimedia streaming services over wireless networks, it is important to overcome inter-cell interference (ICI), particularly in cell border regions. In this regard scalable video coding (SVC) has been actively studied due to its advantage of channel adaptation. We explore an optimal solution for maximizing the expected visual entropy over an orthogonal frequency division multiplexing (OFDM)-based broadband network from the perspective of cross-layer optimization. An optimization problem is parameterized by a set of source and channel parameters that are acquired along the user location over a multicell environment. A suboptimal solution is suggested using a greedy algorithm that allocates the radio resources to the scalable bitstreams as a function of their visual importance. The simulation results show that the greedy algorithm effectively resists ICI in the cell border region, while conventional nonscalable coding suffers severely because of ICI.

Book ChapterDOI
01 Dec 2009
TL;DR: In this paper, the frequency response of linear image processing filters is characterized in terms of their frequency responses, specifically by their spectrum shaping properties, and a broad class of filters have some generalized applications.
Abstract: Publisher Summary Linear image processing filters are characterized in terms of their frequency responses, specifically by their spectrum shaping properties. Coarse descriptions that apply to many two-dimension image processing filters include lowpass, bandpass, or highpass. In such cases, the frequency response is primarily a function of radial frequency, and may even be circularly symmetric, viz., a function of U2 +V2 only. In other cases, the filter may be strongly directional or oriented, with response strongly depending on the frequency angle of the input. Of course, the terms lowpass, bandpass, highpass, and oriented are only rough qualitative descriptions of a system frequency response. Each broad class of filters has some generalized applications. For example, lowpass filters strongly attenuate all but the lower radial image frequencies (as determined by some bandwidth or cutoff frequency), and so are primarily smoothing filters. They are commonly used to reduce high-frequency noise, or to eliminate all but coarse image features, or to reduce the bandwidth of an image prior to transmission through a low-bandwidth communication channel or before sub sampling the image.

Proceedings ArticleDOI
07 Nov 2009
TL;DR: It is shown that traffic flow computed by optical flow estimation effectively captures traffic scene activity and the statistics of traffic flow vectors contain meaningful and interesting characteristics.
Abstract: This paper describes methods for extracting traffic flow information from urban traffic scenes. The ultimate goal is to collect a macroscopic view of traffic flow information in a fully automatic and segmentation-free way. First, traffic flow is calculated by optical flow estimation. Then, traffic flow regions are defined by the initial traffic flow, and further analysis is performed only in the defined traffic flow regions. Basic statistics of the traffic flow vectors are studied. It is shown that traffic flow computed by optical flow estimation effectively captures traffic scene activity. Also, the statistics of traffic flow vectors contain meaningful and interesting characteristics. An example application demonstrates the applicability and potential uses of the statistics.

Patent
07 Dec 2009
TL;DR: In this article, a plurality of grey-level co-occurrence matrices (GLCMs) is extracted for each GLCM and a feature vector is constructed for each partition, where the feature vector includes the second order statistical attributes for each gLCM for the partition.
Abstract: Method for detecting textural defects in an image. The image, which may have an irregular visual texture, may be received. The image may be decomposed into a plurality of subbands. The image may be portioned into a plurality of partitions. A plurality of grey-level co-occurrence matrices (GLCMs) may be determined for each partition. A plurality of second-order statistical attributes may be extracted for each GLCM. A feature vector may be constructed for each partition, where the feature vector includes the second order statistical attributes for each GLCM for the partition. Each partition may be classified based on the feature vector for the respective partition. Classification of the partitions may utilize a one-class support vector machine, and may determine if a defect is present in the image.

Book ChapterDOI
01 Jan 2009
TL;DR: This chapter describes the basic tools for digital image processing, and one of the most important nonlinear point operations is histogram equalization, also called histogram flattening.
Abstract: Publisher Summary This chapter describes the basic tools for digital image processing. The basic tool that is used in designing point operations on digital images is the image histogram. The histogram of the digital image is a plot or graph of the frequency of occurrence of each gray level. Hence, a histogram is a one-dimensional function with domain and possible range extending from 0 to the number of pixels in the image. One of the most important nonlinear point operations is histogram equalization, also called histogram flattening. The idea behind it extends that of FSHS: not only should an image fill the available grayscale range but also it should be uniformly distributed over that range. Hence an idealized goal is a flat histogram. Although care must be taken in applying a powerful nonlinear transformation that actually changes the shape of the image histogram, rather than just stretching it, there are good mathematical reasons for regarding a flat histogram as a desirable goal. In a certain sense, an image with a perfectly flat histogram contains the largest possible amount of information or complexity.

Journal ArticleDOI
TL;DR: A method of calculating the color compensation matrix for multichannel fluorescence images whose specimens are combinatorially stained is presented, which quantifies the color spillover between channels.
Abstract: Multicolor fluorescence in situ hybridization (M-FISH) techniques provide color karyotyping that allows simultaneous analysis of numerical and structural abnormalities of whole human chromosomes. Chromosomes are stained combinatorially in M-FISH. By analyzing the intensity combinations of each pixel, all chromosome pixels in an image are classified. Due to the overlap of excitation and emission spectra and the broad sensitivity of image sensors, the obtained images contain crosstalk between the color channels. The crosstalk complicates both visual and automatic image analysis and may eventually affect the classification accuracy in M-FISH. The removal of crosstalk is possible by finding the color compensation matrix, which quantifies the color spillover between channels. However, there exists no simple method of finding the color compensation matrix from multichannel fluorescence images whose specimens are combinatorially hybridized. In this paper, we present a method of calculating the color compensation matrix for multichannel fluorescence images whose specimens are combinatorially stained.

Journal ArticleDOI
01 May 2009
TL;DR: Critically review methods that have been proposed for assessing multiclass classifiers and conclude that the method proposed by Scurfield provides the most detailed description of classifier performance and insight about the sources of error in a given classification task and the methods proposed by He and Nakas also have great practical utility.
Abstract: Assessment of classifier performance is critical for fair comparison of methods, including considering alternative models or parameters during system design. The assessment must not only provide meaningful data on the classifier efficacy, but it must do so in a concise and clear manner. For two-class classification problems, receiver operating characteristic analysis provides a clear and concise assessment methodology for reporting performance and comparing competing systems. However, many other important biomedical questions cannot be posed as ldquotwo-classrdquo classification tasks and more than two classes are often necessary. While several methods have been proposed for assessing the performance of classifiers for such multiclass problems, none has been widely accepted. The purpose of this paper is to critically review methods that have been proposed for assessing multiclass classifiers. A number of these methods provide a classifier performance index called the volume under surface (VUS). Empirical comparisons are carried out using 4 three-class case studies, in which three popular classification techniques are evaluated with these methods. Since the same classifier was assessed using multiple performance indexes, it is possible to gain insight into the relative strengths and weakness of the measures. We conclude that: 1) the method proposed by Scurfield provides the most detailed description of classifier performance and insight about the sources of error in a given classification task and 2) the methods proposed by He and Nakas also have great practical utility as they provide both the VUS and an estimate of the variance of the VUS. These estimates can be used to statistically compare two classification algorithms.

Patent
30 Apr 2009
TL;DR: In this paper, a method and apparatus detects one or more spiculated masses in an image using a processor and an enhanced image is created by combining an output from all of the filtering steps.
Abstract: A method and apparatus detects one or more spiculated masses in an image using a processor. The image is received in the processor. The received image is filtered using one or more Gaussian filters to detect one or more central mass regions. The received image is also filtered using one or more spiculated lesion filters to detect where the one or more spiculated masses converge. In addition, the received image is filtered using one or more Difference-of-Gaussian filters to suppress one or more linear structures. An enhanced image showing the detected spiculated masses is created by combining an output from all of the filtering steps. The enhanced image is then provided to an output of the processor.

Book ChapterDOI
01 Dec 2009

Book ChapterDOI
01 Dec 2009
TL;DR: The way in which motion is handled in video processing largely determines how videos will be perceived or analyzed, and one of the first steps in a large percentage of video processing algorithms is motion estimation, whereby the movement of intensities or colors is estimated.
Abstract: Publisher Summary The main application of digital video processing is to provide high-quality visible-light videos for human consumption. Digital video processing encompasses many approaches that derive from the essential principles of digital image processing. Indeed, it is best to become conversant in the techniques of digital image processing before embarking on the study of digital video processing. However, there is one important aspect of video processing that significantly distinguishes it from still image processing, makes necessary significant modifications of still image processing methods for adaptation to video, and also requires the development of entirely new processing philosophies. That aspect is motion. It is largely the motion of 3D objects and their 2D projections that determines our visual experience of the world. The way in which motion is handled in video processing largely determines how videos will be perceived or analyzed. Indeed, one of the first steps in a large percentage of video processing algorithms is motion estimation, whereby the movement of intensities or colors is estimated. These motion estimates can be used in a wide variety of ways for video processing and analysis.

Proceedings ArticleDOI
01 Nov 2009
TL;DR: It is demonstrated that MC-SSIM correlates well with human perception of quality, and how a simple and efficient implementation of MC- SSIM can be realized is described.
Abstract: We propose a new full reference video quality assessment algorithm (FR VQA) - the motion compensated structural similarity index (MC-SSIM). MC-SSIM evaluates spatial quality as well as quality along temporal trajectories. Its computationally simplicity makes it a prime choice for practical implementation. In this paper we describe the algorithm and evaluate its performance on a publicly available VQA dataset. We demonstrate that MC-SSIM correlates well with human perception of quality. We also explore its relationship to the human visual system and describe how a simple and efficient implementation of MC-SSIM can be realized.

Book ChapterDOI
01 Dec 2009
TL;DR: Binary images arise in a number of ways, usually they are created from gray level images for simplified processing or for printing, however, certain types of sensors directly deliver a binary image output.
Abstract: Publisher Summary Binary images arise in a number of ways. Usually they are created from gray level images for simplified processing or for printing. However, certain types of sensors directly deliver a binary image output. Such devices are usually associated with printed, handwritten, or line drawing images, with the input signal being entered by hand on a pressure sensitive tablet, a resistive pad, or a light pen. Usually a binary image is obtained from a gray level image by some process of information abstraction. The advantage of the B-fold reduction in the required image storage space is offset by what can be a significant loss of information in the resulting binary image. However, if the process is accomplished with care, then a simple abstraction of information can be obtained that can enhance subsequent processing, analysis, or interpretation of the image.

Proceedings ArticleDOI
29 Jul 2009
TL;DR: The temporal characteristics of undistorted as well as distorted IP video sequences; (distorted by varying levels of packet loss rate) as extracted from optical flow vectors are explored.
Abstract: We model the motion statistics of video sequences, towards the development of no-reference video quality indices that take into account spatial as well as temporal characteristics of video signals. Here we explore the temporal characteristics of undistorted as well as distorted IP video sequences; (distorted by varying levels of packet loss rate) as extracted from optical flow vectors. We present an algorithm for extracting motion statistics by computing independent components (ICs) from the optical flow field. We then model the extracted ICs, and show that they are more closely Laplacian distributed than the entire nondecomposed features. We also observe that the lower the video quality, the higher the root-mean-square (RMS) error difference between the maximum-likelihood Laplacian fits of the two extracted ICs of the flow vectors.

Journal ArticleDOI
TL;DR: Recognition memory for fixated regions from briefly viewed full-screen natural images is examined and it is revealed that observers fixated, on average, image regions that possessed greater visual saliency than non-fixated regions, a finding that is robust across multiple fixation indices.
Abstract: Recognition memory for fixated regions from briefly viewed full-screen natural images is examined. Low-level image statistics reveal that observers fixated, on average (pooled across images and observers), image regions that possessed greater visual saliency than non-fixated regions, a finding that is robust across multiple fixation indices. Recognition-memory performance indicates that, of the fixation loci tested, observers were adept at recognising those with a particular profile of image statistics; visual saliency was found to be attenuated for unrecognised loci, despite that all regions were freely fixated. Furthermore, although elevated luminance was the local image statistic found to discriminate least between human and random image locations, it was the greatest predictor of recognition-memory performance, demonstrating a dissociation between image features that draw fixations and those that support visual memory. An analysis of corresponding eye movements indicates that image regions fixated via short-distance saccades enjoyed better recognition-memory performance, alluding to a focal rather than ambient mode of processing. Recognised image regions were more likely to have originated from areas evaluated (a posteriori) to have higher fixation density, a numerical metric of local interest. Surprisingly, memory for image regions fixated later in the viewing period exhibited no recency advantage, despite (typically) also being longer in duration, a finding for which a number of explanations are posited.

Journal ArticleDOI
TL;DR: This paper provides a set of sampling theorems that offer a path for designing foveation strategies that are optimal with respect to average epipolar area.
Abstract: Biological vision systems have inspired and will continue to inspire the development of computer vision systems. One biological tendency that has never been exploited is the symbiotic relationship between foveation and uncalibrated active, binocular vision systems. The primary goal of any binocular vision system is the correspondence of the two retinal images. For calibrated binocular rigs the search for corresponding points can be restricted to epipolar lines. In an uncalibrated system the precise geometry is unknown. However, the set of possible geometries can be restricted to some reasonable range; and consequently, the search for matching points can be confined to regions delineated by the union of all possible epipolar lines over all possible geometries. We call these regions epipolar spaces. The accuracy and complexity of any correspondence algorithm is directly proportional to the size of these epipolar spaces. Consequently, the introduction of a spatially variant foveation strategy that reduces the average area per epipolar space is highly desirable. This paper provides a set of sampling theorems that offer a path for designing foveation strategies that are optimal with respect to average epipolar area.

Proceedings ArticleDOI
TL;DR: This study finds that a onesided generalized gaussian distribution closely fits the prior of the range gradient, which sheds new light on statistical modeling of 2D and 3D image features in natural scenes.
Abstract: Range maps have been actively studied in the last few years in the context of depth perception in natural scenes. With the availability of co-registered luminance information, we have the ability to examine and model the statistical relationships between luminance, range and disparity. In this study, we find that a onesided generalized gaussian distribution closely fits the prior of the range gradient. This finding sheds new light on statistical modeling of 2D and 3D image features in natural scenes.