scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2012"


Book
22 Dec 2012
TL;DR: To the Human Visual System (HVS), Visual Attention, Neurological Substrate of the HVS, and Neuroscience and Psychology, and Industrial Engineering and Human Factors.
Abstract: to the Human Visual System (HVS).- Visual Attention.- Neurological Substrate of the HVS.- Visual Psychophysics.- Taxonomy and Models of Eye Movements.- Eye Tracking Systems.- Eye Tracking Techniques.- Head-Mounted System Hardware Installation.- Head-Mounted System Software Development.- Head-Mounted System Calibration.- Table-Mounted System Hardware Installation.- Table-Mounted System Software Development.- Table-Mounted System Calibration.- Eye Movement Analysis.- Eye Tracking Methodology.- Experimental Design.- Suggested Empirical Guidelines.- Case Studies.- Eye Tracking Applications.- Diversity and Types of Eye Tracking Applications.- Neuroscience and Psychology.- Industrial Engineering and Human Factors.- Marketing/Advertising.- Computer Science.- Conclusion.

2,399 citations


Journal ArticleDOI
TL;DR: This paper surveys research on attention and visual perception, with a specific focus on results that have direct relevance to visualization and visual analytics.
Abstract: A fundamental goal of visualization is to produce images of data that support visual analysis, exploration, and discovery of novel insights. An important consideration during visualization design is the role of human visual perception. How we "see” details in an image can directly impact a viewer's efficiency and effectiveness. This paper surveys research on attention and visual perception, with a specific focus on results that have direct relevance to visualization and visual analytics. We discuss theories of low-level visual perception, then show how these findings form a foundation for more recent work on visual memory and visual attention. We conclude with a brief overview of how knowledge of visual attention and visual memory is being applied in visualization and graphics. We also discuss how challenges in visualization are motivating research in psychophysics.

330 citations


Book ChapterDOI
07 Oct 2012
TL;DR: This work collects a large human eye fixation database compiled from a pool of 600 2D-vs-3D image pairs viewed by 80 subjects, where the depth information is directly provided by the Kinect camera and the eye tracking data are captured in both 2D and 3D free-viewing experiments.
Abstract: Most previous studies on visual saliency have only focused on static or dynamic 2D scenes. Since the human visual system has evolved predominantly in natural three dimensional environments, it is important to study whether and how depth information influences visual saliency. In this work, we first collect a large human eye fixation database compiled from a pool of 600 2D-vs-3D image pairs viewed by 80 subjects, where the depth information is directly provided by the Kinect camera and the eye tracking data are captured in both 2D and 3D free-viewing experiments. We then analyze the major discrepancies between 2D and 3D human fixation data of the same scenes, which are further abstracted and modeled as novel depth priors. Finally, we evaluate the performances of state-of-the-art saliency detection models over 3D images, and propose solutions to enhance their performances by integrating the depth priors.

266 citations


Journal ArticleDOI
TL;DR: Attempts to model the spectral sensitivity of the circadian system are discussed, each of which varies in terms of its complexity and its consideration of retinal neuroanatomy and neurophysiology.
Abstract: It is now well established that the spectral, spatial, temporal and absolute sensitivities of the human circadian system are very different from those of the human visual system. Although qualitati...

239 citations


Journal ArticleDOI
TL;DR: It is proposed that gaze behavior while determining a person’s identity, emotional state, or gender can be explained as an adaptive brain strategy to learn eye movements that optimize performance in these evolutionarily important perceptual tasks.
Abstract: When viewing a human face, people often look toward the eyes. Maintaining good eye contact carries significant social value and allows for the extraction of information about gaze direction. When identifying faces, humans also look toward the eyes, but it is unclear whether this behavior is solely a byproduct of the socially important eye movement behavior or whether it has functional importance in basic perceptual tasks. Here, we propose that gaze behavior while determining a person’s identity, emotional state, or gender can be explained as an adaptive brain strategy to learn eye movement plans that optimize performance in these evolutionarily important perceptual tasks. We show that humans move their eyes to locations that maximize perceptual performance determining the identity, gender, and emotional state of a face. These optimal fixation points, which differ moderately across tasks, are predicted correctly by a Bayesian ideal observer that integrates information optimally across the face but is constrained by the decrease in resolution and sensitivity from the fovea toward the visual periphery (foveated ideal observer). Neither a model that disregards the foveated nature of the visual system and makes fixations on the local region with maximal information, nor a model that makes center-of-gravity fixations correctly predict human eye movements. Extension of the foveated ideal observer framework to a large database of real-world faces shows that the optimality of these strategies generalizes across the population. These results suggest that the human visual system optimizes face recognition performance through guidance of eye movements not only toward but, more precisely, just below the eyes.

234 citations


Journal ArticleDOI
TL;DR: Let the human visual system determine the quantization curve used to encode video signals, and optimal efficiency is maintained across the luminance range of interest, and the visibility of quantization artifacts is kept to a uniformly small level.
Abstract: As the performance of electronic display systems continues to increase, the limitations of current signal coding methods become increasingly apparent. With bit-depth limitations set by industry standard interfaces, a more efficient coding system is desired to allow image quality to increase without requiring expansion of legacy infrastructure bandwidth. A good approach to this problem is to let the human visual system determine the quantization curve used to encode video signals. In this way, optimal efficiency is maintained across the luminance range of interest, and the visibility of quantization artifacts is kept to a uniformly small level.

195 citations


Book ChapterDOI
07 Oct 2012
TL;DR: This work complements existing state-of-the art large-scale dynamic computer vision datasets like Hollywood-2 and UCF Sports with human eye movements collected under the ecological constraints of the visual action recognition task, and introduces novel dynamic consistency and alignment models, which underline the remarkable stability of patterns of visual search among subjects.
Abstract: Systems based on bag-of-words models operating on image features collected at maxima of sparse interest point operators have been extremely successful for both computer-based visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in "saccade and fixate" regimes, the knowledge, methodology, and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large-scale dynamic computer vision datasets like Hollywood-2[1] and UCF Sports[2] with human eye movements collected under the ecological constraints of the visual action recognition task. To our knowledge these are the first massive human eye tracking datasets of significant size to be collected for video (497,107 frames, each viewed by 16 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as opposed to free-viewing. Second, we introduce novel dynamic consistency and alignment models, which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the massive amounts of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the most advanced computer vision practice, can lead to state of the art results.

172 citations


Journal ArticleDOI
TL;DR: A new saliency detection model based on the human visual sensitivity and the amplitude spectrum of quaternion Fourier transform (QFT) to represent the color, intensity, and orientation distributions for image patches is proposed.
Abstract: With the wide applications of saliency information in visual signal processing, many saliency detection methods have been proposed. However, some key characteristics of the human visual system (HVS) are still neglected in building these saliency detection models. In this paper, we propose a new saliency detection model based on the human visual sensitivity and the amplitude spectrum of quaternion Fourier transform (QFT). We use the amplitude spectrum of QFT to represent the color, intensity, and orientation distributions for image patches. The saliency value for each image patch is calculated by not only the differences between the QFT amplitude spectrum of this patch and other patches in the whole image, but also the visual impacts for these differences determined by the human visual sensitivity. The experiment results show that the proposed saliency detection model outperforms the state-of-the-art detection models. In addition, we apply our proposed model in the application of image retargeting and achieve better performance over the conventional algorithms.

171 citations


Book ChapterDOI
07 Oct 2012
TL;DR: This work employs saliency-mapping algorithms to find informative regions and descriptors corresponding to these regions are either used exclusively, or are given greater representational weight (additional codebook vectors), and uses saliency maps derived from human eye movements to probe the limits of the approach.
Abstract: Algorithms using "bag of features"-style video representations currently achieve state-of-the-art performance on action recognition tasks, such as the challenging Hollywood2 benchmark [1,2,3]. These algorithms are based on local spatiotemporal descriptors that can be extracted either sparsely (at interest points) or densely (on regular grids), with dense sampling typically leading to the best performance [1]. Here, we investigate the benefit of space-variant processing of inputs, inspired by attentional mechanisms in the human visual system. We employ saliency-mapping algorithms to find informative regions and descriptors corresponding to these regions are either used exclusively, or are given greater representational weight (additional codebook vectors). This approach is evaluated with three state-of-the-art action recognition algorithms [1,2,3], and using several saliency algorithms. We also use saliency maps derived from human eye movements to probe the limits of the approach. Saliency-based pruning allows up to 70% of descriptors to be discarded, while maintaining high performance on Hollywood2. Meanwhile, pruning of 20-50% (depending on model) can even improve recognition. Further improvements can be obtained by combining representations learned separately on salience-pruned and unpruned descriptor sets. Not surprisingly, using the human eye movement data gives the best mean Average Precision (mAP; 61.9%), providing an upper bound on what is possible with a high-quality saliency map. Even without such external data, the Dense Trajectories model [1] enhanced by automated saliency-based descriptor sampling achieves the best mAP (60.0%) reported on Hollywood2 to date.

134 citations


Journal ArticleDOI
TL;DR: This paper thoroughly reviews the recent advances of perceptual video compression mainly in terms of the three major components, namely, perceptual model definition, implementation of coding, and performance evaluation.
Abstract: With the advances in understanding perceptual properties of the human visual system and constructing their computational models, efforts toward incorporating human perceptual mechanisms in video compression to achieve maximal perceptual quality have received great attention. This paper thoroughly reviews the recent advances of perceptual video compression mainly in terms of the three major components, namely, perceptual model definition, implementation of coding, and performance evaluation. Furthermore, open research issues and challenges are discussed in order to provide perspectives for future research trends.

134 citations


Journal ArticleDOI
TL;DR: A new objective metric is proposed for the visual quality assessment of 3D meshes based on a mesh local roughness measure derived from Gaussian curvature that can predict the extent of the visual difference between a reference mesh and a distorted version.

Journal ArticleDOI
TL;DR: An improved algorithm based on the contrast mechanism of human visual system (HVS) for infrared small target detection in an image with complicated background is proposed, which demonstrates its superior and reliable detection performance by high detection rate and low false alarm rate.

Journal ArticleDOI
TL;DR: This article presents a fusion-based contrast-enhancement technique which integrates information to overcome the limitations of different contrast- enhancement algorithms and shows the efficiency of the method in enhancing details without affecting the colour balance or introducing saturation artefacts.
Abstract: The goal of contrast enhancement is to improve visibility of image details without introducing unrealistic visual appearances and/or unwanted artefacts. While global contrast-enhancement techniques enhance the overall contrast, their dependences on the global content of the image limit their ability to enhance local details. They also result in significant change in image brightness and introduce saturation artefacts. Local enhancement methods, on the other hand, improve image details but can produce block discontinuities, noise amplification and unnatural image modifications. To remedy these shortcomings, this article presents a fusion-based contrast-enhancement technique which integrates information to overcome the limitations of different contrast-enhancement algorithms. The proposed method balances the requirement of local and global contrast enhancements and a faithful representation of the original image appearance, an objective that is difficult to achieve using traditional enhancement methods. Fusion is performed in a multi-resolution fashion using Laplacian pyramid decomposition to account for the multi-channel properties of the human visual system. For this purpose, metrics are defined for contrast, image brightness and saturation. The performance of the proposed method is evaluated using visual assessment and quantitative measures for contrast, luminance and saturation. The results show the efficiency of the method in enhancing details without affecting the colour balance or introducing saturation artefacts and illustrate the usefulness of fusion techniques for image enhancement applications.

Journal ArticleDOI
TL;DR: A novel watermarking approach for copyright protection of color images based on the wavelet transformation which yields a watermark which is invisible to human eyes and robust to a wide variety of common attacks.

Proceedings ArticleDOI
05 Nov 2012
TL;DR: This work presents a new visible tagging solution for active displays which allows a rolling-shutter camera to detect active tags from a relatively large distance in a robust manner and uses intelligent binary coding to encode digital positioning and shows potential applications such as large screen interaction.
Abstract: We show a new visible tagging solution for active displays which allows a rolling-shutter camera to detect active tags from a relatively large distance in a robust manner. Current planar markers are visually obtrusive for the human viewer. In order for them to be read from afar and embed more information, they must be shown larger thus occupying valuable physical space on the design. We present a new active visual tag which utilizes all dimensions of color, time and space while remaining unobtrusive to the human eye and decodable using a 15fps rolling-shutter camera. The design exploits the flicker fusion-frequency threshold of the human visual system, which due to the effect of metamerism, can not resolve metamer pairs alternating beyond 120Hz. Yet, concurrently, it is decodable using a 15fps rolling-shutter camera due to the effective line-scan speed of 15×400 lines per second. We show an off-the-shelf rolling-shutter camera can resolve the metamers flickering on a television from a distance over 4 meters. We use intelligent binary coding to encode digital positioning and show potential applications such as large screen interaction. We analyze the use of codes for locking and tracking encoded targets. We also analyze the constraints and performance of the sampling system, and discuss several plausible application scenarios.

Journal ArticleDOI
TL;DR: A new image quality assessment algorithm based on the phase and magnitude of the 2-D discrete Fourier transform that is overall better than several of the existing full-reference algorithms and two RR algorithms and further scalable for RR scenarios.
Abstract: We present a new image quality assessment algorithm based on the phase and magnitude of the 2-D discrete Fourier transform. The basic idea is to compare the phase and magnitude of the reference and distorted images to compute the quality score. However, it is well known that the human visual system's sensitivity to different frequency components is not the same. We accommodate this fact via a simple yet effective strategy of non-uniform binning of the frequency components. This process also leads to reduced space representation of the image thereby enabling the reduced-reference (RR) prospects of the proposed scheme. We employ linear regression to integrate the effects of the changes in phase and magnitude. In this way, the required weights are determined via proper training and hence more convincing and effective. Last, using the fact that phase usually conveys more information than magnitude, we use only the phase for RR quality assessment. This provides the crucial advantage of further reduction in the required amount of reference image information. The proposed method is, therefore, further scalable for RR scenarios. We report extensive experimental results using a total of nine publicly available databases: seven image (with a total of 3832 distorted images with diverse distortions) and two video databases (totally 228 distorted videos). These show that the proposed method is overall better than several of the existing full-reference algorithms and two RR algorithms. Additionally, there is a graceful degradation in prediction performance as the amount of reference image information is reduced thereby confirming its scalability prospects. To enable comparisons and future study, a Matlab implementation of the proposed algorithm is available at http://www.ntu.edu.sg/home/wslin/reduced_phase.rar.

Journal ArticleDOI
Chanho Jung1, Changick Kim1
TL;DR: A visual attention model is incorporated for efficient saliency detection, and the salient regions are employed as object seeds for the authors' automatic object segmentation system, and an iterative self-adaptive segmentation framework for more accurateobject segmentation is proposed.
Abstract: In this paper, a visual attention model is incorporated for efficient saliency detection, and the salient regions are employed as object seeds for our automatic object segmentation system. In contrast with existing interactive segmentation approaches that require considerable user interaction, the proposed method does not require it, i.e., the segmentation task is fulfilled in a fully automatic manner. First, we introduce a novel unified spectral-domain approach for saliency detection. Our visual attention model originates from a well-known property of the human visual system that the human visual perception is highly adaptive and sensitive to structural information in images rather than nonstructural information. Then, based on the saliency map, we propose an iterative self-adaptive segmentation framework for more accurate object segmentation. Extensive tests on a variety of cluttered natural images show that the proposed algorithm is an efficient indicator for characterizing the human perception and it can provide satisfying segmentation performance.

Journal ArticleDOI
TL;DR: This work proposes a method to segment the object of interest by finding the “optimal” closed contour around the fixation point in the polar space, avoiding the perennial problem of scale in the Cartesian space.
Abstract: Attention is an integral part of the human visual system and has been widely studied in the visual attention literature. The human eyes fixate at important locations in the scene, and every fixation point lies inside a particular region of arbitrary shape and size, which can either be an entire object or a part of it. Using that fixation point as an identification marker on the object, we propose a method to segment the object of interest by finding the “optimal” closed contour around the fixation point in the polar space, avoiding the perennial problem of scale in the Cartesian space. The proposed segmentation process is carried out in two separate steps: First, all visual cues are combined to generate the probabilistic boundary edge map of the scene; second, in this edge map, the “optimal” closed contour around a given fixation point is found. Having two separate steps also makes it possible to establish a simple feedback between the mid-level cue (regions) and the low-level visual cues (edges). In fact, we propose a segmentation refinement process based on such a feedback process. Finally, our experiments show the promise of the proposed method as an automatic segmentation framework for a general purpose visual system.

Journal ArticleDOI
TL;DR: In computer graphics, triangle meshes are ubiquitous as a representation of surface models and advanced processing algorithms are continuously being proposed, aiming at improving performance (compression ratio, watermark robustness and capacity), while minimizing the introduced distortion.
Abstract: In computer graphics, triangle meshes are ubiquitous as a representation of surface models. Processing of this kind of data, such as compression or watermarking, often involves an unwanted distortion of the surface geometry. Advanced processing algorithms are continuously being proposed, aiming at improving performance (compression ratio, watermark robustness and capacity), while minimizing the introduced distortion. In most cases, the final resulting mesh is intended to be viewed by a human observer, and it is therefore necessary to minimise the amount of distortion perceived by the human visual system. However, only recently there have been studies published on subjective experiments in this field, showing that previously used objective error measures exhibit rather poor correlation with the results of subjective experiments. In this paper, we present results of our own large subjective testing aimed at human perception of triangle mesh distortion. We provide an independent confirmation of the previous result by Lavoue et al. that most current metrics perform poorly, with the exception of the MSDM/MSDM2 metrics. We propose a novel metric based on measuring the distortion of dihedral angles, which provides even higher correlation with the results of our experiments and experiments performed by other researchers. Our metric is about two orders of magnitude faster than MSDM/MSDM2, which makes it much more suitable for usage in iterative optimisation algorithms. © 2012 Wiley Periodicals, Inc.

Book ChapterDOI
TL;DR: The concept of logarithmic additive contrast (LAC), its physical interpretation based on transmittance notion and some resulting properties, is introduced, which represents by definition a grey level and it is highly efficient when computed on dark pairs of pixels, with applications for low-lighted images.
Abstract: The logarithmic image processing model is now recognized as a powerful framework to process images acquired in transmitted light and to take into account the human visual system. One of its major interests is linked to the strong mathematical properties it satisfies, allowing the definition and use of rigorous operators. In this paper, we introduce the concept of logarithmic additive contrast (LAC), its physical interpretation based on transmittance notion and some resulting properties: It represents by definition a grey level and it is highly efficient when computed on dark pairs of pixels, with applications for low-lighted images. Then the LAC is compared with the classical Michelson contrast, showing an explicit link between them. Furthermore, the LAC is demonstrated as very useful in the fields of automated thresholding and contour detection. Another major interest of the LAC is that it allows defining logarithmic metrics, opening various applications: grey-level images comparison, pattern recognition, target tracking, defect detection in industrial vision, and the creation of a new class of automated thresholding algorithms. Another part of the paper is dedicated to a novel notion of logarithmic multiplicative contrast (LMC), which appears as a positive real number and also presents a “physical” interpretation in terms of transmittance. Our research concerning the LMC remains today at an exploratory level if we consider the number of possible ways to deepen this notion. In fact, the LMC values may exceed the grey-scale maximum, which necessitates some normalization to display them as a contrast map. Nevertheless, the LMC is very sensitive near the bright extremity of the grey scale, which is very useful in processing overlighted images. As does the LAC, the LMC generates many new metrics, particularly the Asplund one, and a metric combining information on shapes and grey levels. Until now, Asplund's metric had been defined for binary shapes and is extended here to grey-level images, with interesting applications to pattern recognition.

Journal ArticleDOI
TL;DR: Experimental results show that VGS is competitive with state-of-the-art metrics in terms of prediction precision, reliability, simplicity, and low computational cost.
Abstract: A full-reference image quality assessment (IQA) model by multiscale visual gradient similarity (VGS) is presented. The VGS model adopts a three-stage approach: First, global contrast registration for each scale is applied. Then, pointwise comparison is given by multiplying the similarity of gradient direction with the similarity of gradient magnitude. Third, intrascale pooling is applied, followed by interscale pooling. Several properties of human visual systems on image gradient have been explored and incorporated into the VGS model. It has been found that Stevens' power law is also suitable for gradient magnitude. Other factors such as quality uniformity, visual detection threshold of gradient, and visual frequency sensitivity also affect subjective image quality. The optimal values of two parameters of VGS are trained with existing IQA databases, and good performance of VGS has been verified by cross validation. Experimental results show that VGS is competitive with state-of-the-art metrics in terms of prediction precision, reliability, simplicity, and low computational cost.

Journal ArticleDOI
Yong Ju Jung1, Seong-il Lee1, Hosik Sohn1, HyunWook Park1, Yong Man Ro1 
TL;DR: A novel visual comfort assessment metric framework that systematically exploits human visual attention models is proposed that quantifies the level of visual discomfort caused by fast salient object motion.
Abstract: Objective assessment of visual comfort for stereoscopic video is of great importance for stereoscopic image safety issue. We propose a novel visual comfort assessment metric framework that systematically exploits human visual attention models. In a stereoscopic video shot, perceptually significant regions where human subjects pay more attention are likely to play an essential role in determining the overall level of visual comfort. As a specific example of this concept, we develop a visual comfort metric that quantifies the level of visual discomfort caused by fast salient object motion. The performance of the proposed visual comfort metric has been evaluated using natural stereoscopic videos. The experimental results show that the proposed visual comfort metric significantly improves the correlations with subjective judgment.

Journal ArticleDOI
TL;DR: In this article, a total variation (TV) and non-local TV regularized model based on Retinex theory is proposed to solve the color constancy problem in human visual system.
Abstract: A feature of the human visual system (HVS) is color constancy, namely, the ability to determine the color under varying illumination conditions. Retinex theory, formulated by Edwin H. Land, aimed to simulate and explain how the HVS perceives color. In this paper, we establish a total variation (TV) and nonlocal TV regularized model of Retinex theory that can be solved by a fast computational approach based on Bregman iteration. We demonstrate the performance of our method by numerical results.

Journal ArticleDOI
01 Nov 2012
TL;DR: This work is the first to account for the interplay of luminance contrast (magnitude/frequency) and disparity and its model predicts the human response to complex stereo-luminance images.
Abstract: Binocular disparity is one of the most important depth cues used by the human visual system. Recently developed stereo-perception models allow us to successfully manipulate disparity in order to improve viewing comfort, depth discrimination as well as stereo content compression and display. Nonetheless, all existing models neglect the substantial influence of luminance on stereo perception. Our work is the first to account for the interplay of luminance contrast (magnitude/frequency) and disparity and our model predicts the human response to complex stereo-luminance images. Besides improving existing disparity-model applications (e.g., difference metrics or compression), our approach offers new possibilities, such as joint luminance contrast and disparity manipulation or the optimization of auto-stereoscopic content. We validate our results in a user study, which also reveals the advantage of considering luminance contrast and its significant impact on disparity manipulation techniques.

Journal ArticleDOI
TL;DR: The proposed full-reference (FR) algorithm is more efficient due to its low complexity without jeopardizing the prediction accuracy and cross-database tests have been carried out to provide a proper perspective of the performance of this scheme as compared to other VQA methods.
Abstract: Objective video quality assessment (VQA) is the use of computational models to evaluate the video quality in line with the perception of the human visual system (HVS). It is challenging due to the underlying complexity, and the relatively limited understanding of the HVS and its intricate mechanisms. There are three important issues that arise in objective VQA in comparison with image quality assessment: 1) the temporal factors apart from the spatial ones also need to be considered, 2) the contribution of each factor (spatial and temporal) and their interaction to the overall video quality need to be determined, and 3) the computational complexity of the resultant method. In this paper, we seek to tackle the first issue by utilizing the worst case pooling strategy and the variations of spatial quality along the temporal axis with proper analysis and justification. The second issue is addressed by the use of machine learning; we believe this to be more convincing since the relationship between the factors and the overall quality is derived via training with substantial ground truth (i.e., subjective scores). Experiments conducted using publicly available video databases show the effectiveness of the proposed full-reference (FR) algorithm in comparison to the relevant existing VQA schemes. Focus has also been placed on demonstrating the robustness of the proposed method to new and untrained data. To that end, cross-database tests have been carried out to provide a proper perspective of the performance of proposed scheme as compared to other VQA methods. The third issue regarding the computational costs also plays a key role in determining the feasibility of a VQA scheme for practical deployment given the large amount of data that needs to be processed/analyzed in real time. A limitation of many existing VQA algorithms is their higher computational complexity. In contrast, the proposed scheme is more efficient due to its low complexity without jeopardizing the prediction accuracy.

Proceedings ArticleDOI
03 Apr 2012
TL;DR: This paper proposes a completely automated approach for MSRCR by obtaining parameter values from the image being enhanced by obtaining Parameters used in this enhancement method are varied based on the images under consideration.
Abstract: The dynamic range of a camera is much lesser than that of human visual system. This causes images taken by the camera to look different from how the scene would have looked to a naked eye. Multi Scale Retinex with Color Restoration (MSRCR) algorithm enhances images taken under a wide range of nonlinear illumination conditions to the level that a user would have perceived it in real time. But there are parameters used in this enhancement method that are image dependent and have to be varied based on the images under consideration. In this paper we propose a completely automated approach for MSRCR by obtaining parameter values from the image being enhanced.

Journal ArticleDOI
01 Jan 2012
TL;DR: The experiment results show that the present DE2000-based metric can be consistent with human visual system in general application environment.
Abstract: Combining the color difference formula of CIEDE2000 and the printing industry standard for visual verification, we present an objective color image quality assessment method correlated with subjective vision perception. An objective score conformed to subjective perception (OSCSP) Q was proposed to directly reflect the subjective visual perception. In addition, we present a general method to calibrate correction factors of color difference formula under real experimental conditions. Our experiment results show that the present DE2000-based metric can be consistent with human visual system in general application environment.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A content-aware image resizing method which simultaneously preserves both salient image features and important line structure properties: parallelism, collinearity and orientation is proposed.
Abstract: This paper proposes a content-aware image resizing method which simultaneously preserves both salient image features and important line structure properties: parallelism, collinearity and orientation. When there are prominent line structures in the image, image resizing methods without explicitly taking these properties into account could produce line structure distortions in their results. Since the human visual system is very sensitive to line structures, such distortions often become noticeable and disturbing. Our method couples mesh deformations for image resizing with similarity transforms for line features. Mesh deformations are used to control content preservation while similarity transforms are analyzed in the Hough space to maintain line structure properties. Our method strikes a good balance between preserving content and maintaining line structure properties. Experiments show the proposed method often outperforms methods without taking line structures into account, especially for scenes with prominent line structures.

Journal ArticleDOI
TL;DR: A novel full-reference video quality metric is developed, which conceptually comprises the following processing steps: decoupling detail losses and additive impairments within each frame for spatial distortion measure; analyzing the video motion and using the HVS characteristics to simulate the human perception of the spatial distortions; and taking into account cognitive human behaviors to integrate frame- level quality scores into sequence-level quality score.
Abstract: Video quality assessment plays a fundamental role in video processing and communication applications. In this paper, we study the use of motion information and temporal human visual system (HVS) characteristics for objective video quality assessment. In our previous work, two types of spatial distortions, i.e., detail losses and additive impairments, are decoupled and evaluated separately for spatial quality assessment. The detail losses refer to the loss of useful visual information that will affect the content visibility, and the additive impairments represent the redundant visual information in the test image, such as the blocking or ringing artifacts caused by data compression and so on. In this paper, a novel full-reference video quality metric is developed, which conceptually comprises the following processing steps: 1) decoupling detail losses and additive impairments within each frame for spatial distortion measure; 2) analyzing the video motion and using the HVS characteristics to simulate the human perception of the spatial distortions; and 3) taking into account cognitive human behaviors to integrate frame-level quality scores into sequence-level quality score. Distinguished from most studies in the literature, the proposed method comprehensively investigates the use of motion information in the simulation of HVS processing, e.g., to model the eye movement, to predict the spatio-temporal HVS contrast sensitivity, to implement the temporal masking effect, and so on. Furthermore, we also prove the effectiveness of decoupling detail losses and additive impairments for video quality assessment. The proposed method is tested on two subjective quality video databases, LIVE and IVP, and demonstrates the state-of-the-art performance in matching subjective ratings.

Journal ArticleDOI
TL;DR: Algorithms are presented to construct synthetic images in which local image statistics--including luminance distributions, pair-wise correlations, and higher-order correlations--are explicitly specified and all other statistics are determined implicitly by maximum-entropy, to measure the sensitivity of the human visual system to local imageStatistics and to sample their interactions.
Abstract: The space of visual signals is high-dimensional and natural visual images have a highly complex statistical structure. While many studies suggest that only a limited number of image statistics are used for perceptual judgments, a full understanding of visual function requires analysis not only of the impact of individual image statistics, but also, how they interact. In natural images, these statistical elements (luminance distributions, correlations of low and high order, edges, occlusions, etc.) are intermixed, and their effects are difficult to disentangle. Thus, there is a need for construction of stimuli in which one or more statistical elements are introduced in a controlled fashion, so that their individual and joint contributions can be analyzed. With this as motivation, we present algorithms to construct synthetic images in which local image statistics—including luminance distributions, pair-wise correlations, and higher-order correlations—are explicitly specified and all other statistics are determined implicitly by maximum-entropy. We then apply this approach to measure the sensitivity of the human visual system to local image statistics and to sample their interactions.