scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2013"


Journal ArticleDOI
TL;DR: A flexible framework that allows for LPC computation in arbitrary fractional scales is proposed and a new sharpness assessment algorithm is developed without referencing the original image to demonstrate competitive performance when compared with state-of-the-art algorithms.
Abstract: Sharpness is an important determinant in visual assessment of image quality. The human visual system is able to effortlessly detect blur and evaluate sharpness of visual images, but the underlying mechanism is not fully understood. Existing blur/sharpness evaluation algorithms are mostly based on edge width, local gradient, or energy reduction of global/local high frequency content. Here we understand the subject from a different perspective, where sharpness is identified as strong local phase coherence (LPC) near distinctive image features evaluated in the complex wavelet transform domain. Previous LPC computation is restricted to be applied to complex coefficients spread in three consecutive dyadic scales in the scale-space. Here we propose a flexible framework that allows for LPC computation in arbitrary fractional scales. We then develop a new sharpness assessment algorithm without referencing the original image. We use four subject-rated publicly available image databases to test the proposed algorithm, which demonstrates competitive performance when compared with state-of-the-art algorithms.

254 citations


Journal ArticleDOI
TL;DR: Experimental results on six publicly available databases demonstrate that the proposed metric is comparable with the state-of-the-art quality metrics.
Abstract: Objective image quality assessment (IQA) aims to evaluate image quality consistently with human perception Most of the existing perceptual IQA metrics cannot accurately represent the degradations from different types of distortion, eg, existing structural similarity metrics perform well on content-dependent distortions while not as well as peak signal-to-noise ratio (PSNR) on content-independent distortions In this paper, we integrate the merits of the existing IQA metrics with the guide of the recently revealed internal generative mechanism (IGM) The IGM indicates that the human visual system actively predicts sensory information and tries to avoid residual uncertainty for image perception and understanding Inspired by the IGM theory, we adopt an autoregressive prediction algorithm to decompose an input scene into two portions, the predicted portion with the predicted visual content and the disorderly portion with the residual content Distortions on the predicted portion degrade the primary visual information, and structural similarity procedures are employed to measure its degradation; distortions on the disorderly portion mainly change the uncertain information and the PNSR is employed for it Finally, according to the noise energy deployment on the two portions, we combine the two evaluation results to acquire the overall quality score Experimental results on six publicly available databases demonstrate that the proposed metric is comparable with the state-of-the-art quality metrics

238 citations


Journal ArticleDOI
26 Jul 2013
TL;DR: The principles and methods of modern algorithms for automatically predicting the quality of visual signals are discussed and divided into understandable modeling subproblems by casting the problem as analogous to assessing the efficacy of a visual communication system.
Abstract: Finding ways to monitor and control the perceptual quality of digital visual media has become a pressing concern as the volume being transported and viewed continues to increase exponentially. This paper discusses the principles and methods of modern algorithms for automatically predicting the quality of visual signals. By casting the problem as analogous to assessing the efficacy of a visual communication system, it is possible to divide the quality assessment problem into understandable modeling subproblems. Along the way, we will visit models of natural images and videos, of visual perception, and a broad spectrum of applications.

206 citations


Journal ArticleDOI
TL;DR: An efficient visual attention model based key frame extraction method using the temporal gradient based dynamic visual saliency detection instead of the traditional optical flow methods to provide summaries of videos in the form of key frames.
Abstract: The huge amount of video data on the internet requires efficient video browsing and retrieval strategies. One of the viable solutions is to provide summaries of the videos in the form of key frames. The video summarization using visual attention modeling has been used of late. In such schemes, the visually salient frames are extracted as key frames on the basis of theories of human attention modeling. The visual attention modeling schemes have proved to be effective in video summarization. However, the high computational costs incurred by these techniques limit their applicability in practical scenarios. In this context, this paper proposes an efficient visual attention model based key frame extraction method. The computational cost is reduced by using the temporal gradient based dynamic visual saliency detection instead of the traditional optical flow methods. Moreover, for static visual saliency, an effective method employing discrete cosine transform has been used. The static and dynamic visual attention measures are fused by using a non-linear weighted fusion method. The experimental results indicate that the proposed method is not only efficient, but also yields high quality video summaries.

180 citations


Proceedings ArticleDOI
01 Sep 2013
TL;DR: This paper proposes a novel conceptually simple salient region detection method, namely SDSP, by combining three simple priors, indicating that SDSP could outperform the other state-of-the-art algorithms by yielding higher saliency prediction accuracy.
Abstract: Salient regions detection from images is an important and fundamental research problem in neuroscience and psychology and it serves as an indispensible step for numerous machine vision tasks. In this paper, we propose a novel conceptually simple salient region detection method, namely SDSP, by combining three simple priors. At first, the behavior that the human visual system detects salient objects in a visual scene can be well modeled by band-pass filtering. Secondly, people are more likely to pay their attention on the center of an image. Thirdly, warm colors are more attractive to people than cold colors are. Extensive experiments conducted on the benchmark dataset indicate that SDSP could outperform the other state-of-the-art algorithms by yielding higher saliency prediction accuracy. Moreover, SDSP has a quite low computational complexity, rendering it an outstanding candidate for time critical applications. The Matlab source code of SDSP and the evaluation results have been made online available at http://sse.tongji.edu.cn/linzhang/va/SDSP/SDSP.htm.

175 citations


Journal ArticleDOI
TL;DR: A novel RR IQA index based on visual information fidelity is proposed, advocating that distortions on the primary visual information mainly disturb image understanding, and distortions in the residual uncertainty mainly change the comfort of perception.
Abstract: Reduced-reference (RR) image quality assessment (IQA) aims to use less data about the reference image and achieve higher evaluation accuracy. Recent research on brain theory suggests that the human visual system (HVS) actively predicts the primary visual information and tries to avoid the residual uncertainty for image perception and understanding. Therefore, the perceptual quality relies to the information fidelities of the primary visual information and the residual uncertainty. In this paper, we propose a novel RR IQA index based on visual information fidelity. We advocate that distortions on the primary visual information mainly disturb image understanding, and distortions on the residual uncertainty mainly change the comfort of perception. We separately compute the quantities of the primary visual information and the residual uncertainty of an image. Then the fidelities of the two types of information are separately evaluated for quality assessment. Experimental results demonstrate that the proposed index uses few data (30 bits) and achieves high consistency with human perception.

157 citations


Journal ArticleDOI
TL;DR: A quality metric called sparse feature fidelity (SFF) is proposed for full-reference image quality assessment (IQA) on the basis of transformation of images into sparse representations in the primary visual cortex based on the sparse features acquired by a feature detector, which is trained on samples of natural images by an ICA algorithm.
Abstract: The prediction of an image quality metric (IQM) should be consistent with subjective human evaluation. As the human visual system (HVS) is critical to visual perception, modeling of the HVS is regarded as the most suitable way to achieve perceptual quality predictions. Sparse coding that is equivalent to independent component analysis (ICA) can provide a very good description of the receptive fields of simple cells in the primary visual cortex, which is the most important part of the HVS. With this inspiration, a quality metric called sparse feature fidelity (SFF) is proposed for full-reference image quality assessment (IQA) on the basis of transformation of images into sparse representations in the primary visual cortex. The proposed method is based on the sparse features that are acquired by a feature detector, which is trained on samples of natural images by an ICA algorithm. In addition, two strategies are designed to simulate the properties of the visual perception: 1) visual attention and 2) visual threshold. The computation of SFF has two stages: training and fidelity computation, in addition, the fidelity computation consists of two components: feature similarity and luminance correlation. The feature similarity measures the structure differences between the two images, whereas the luminance correlation evaluates brightness distortions. SFF also reflects the chromatic properties of the HVS, and it is very effective for color IQA. The experimental results on five image databases show that SFF has a better performance in matching subjective ratings compared with the leading IQMs.

149 citations


Journal ArticleDOI
TL;DR: It is shown that, when human observers categorize global information in real-world scenes, the brain exhibits strong sensitivity to low-level summary statistics, and that global scene information may be computed by spatial pooling of responses from early visual areas (e.g., LGN or V1).
Abstract: The visual system processes natural scenes in a split second. Part of this process is the extraction of "gist," a global first impression. It is unclear, however, how the human visual system computes this information. Here, we show that, when human observers categorize global information in real-world scenes, the brain exhibits strong sensitivity to low-level summary statistics. Subjects rated a specific instance of a global scene property, naturalness, for a large set of natural scenes while EEG was recorded. For each individual scene, we derived two physiologically plausible summary statistics by spatially pooling local contrast filter outputs: contrast energy (CE), indexing contrast strength, and spatial coherence (SC), indexing scene fragmentation. We show that behavioral performance is directly related to these statistics, with naturalness rating being influenced in particular by SC. At the neural level, both statistics parametrically modulated single-trial event-related potential amplitudes during an early, transient window (100-150 ms), but SC continued to influence activity levels later in time (up to 250 ms). In addition, the magnitude of neural activity that discriminated between man-made versus natural ratings of individual trials was related to SC, but not CE. These results suggest that global scene information may be computed by spatial pooling of responses from early visual areas (e.g., LGN or V1). The increased sensitivity over time to SC in particular, which reflects scene fragmentation, suggests that this statistic is actively exploited to estimate scene naturalness.

113 citations


Journal ArticleDOI
TL;DR: Trans transcranial magnetic stimulation is used to disrupt signaling in V1/V2 and in the lateral occipital (LO) area at different moments in time while participants performed a discrimination task involving a Kanizsa-type illusory figure to show that both V2 and higher-level visual area LO are critically involved in perceptual completion.
Abstract: A striking example of the constructive nature of visual perception is how the human visual system completes contours of occluded objects. To date, it is unclear whether perceptual completion emerges during early stages of visual processing or whether higher-level mechanisms are necessary. To answer this question, we used transcranial magnetic stimulation to disrupt signaling in V1/V2 and in the lateral occipital (LO) area at different moments in time while participants performed a discrimination task involving a Kanizsa-type illusory figure. Results show that both V1/V2 and higher-level visual area LO are critically involved in perceptual completion. However, these areas seem to be involved in an inverse hierarchical fashion, in which the critical time window for V1/V2 follows that for LO. These results are in line with the growing evidence that feedback to V1/V2 contributes to perceptual completion.

106 citations


Journal ArticleDOI
TL;DR: A novel framework for medical image fusion based on frame let transform is proposed considering the characteristics of human visual system (HVS) to decompose all source images by the framelet transform.
Abstract: Multi-modal medical image fusion, as a powerful tool for the clinical applications, has developed with the advent of various imaging modalities in medical imaging. The main motivation is to capture most relevant information from sources into a single output, which plays an important role in medical diagnosis. In this paper, a novel framework for medical image fusion based on framelet transform is proposed considering the characteristics of human visual system (HVS). The core idea behind the proposed framework is to decompose all source images by the framelet transform. Two different HVS inspired fusion rules are proposed for combining the low- and high-frequency coefficients respectively. The former is based on the visibility measure, and the latter is based on the texture information. Finally, the fused image is constructed by the inverse framelet transform with all composite coefficients. Experimental results highlight the expediency and suitability of the proposed framework. The efficiency of the proposed method is demonstrated by the different experiments on different multi-modal medical images. Further, the enhanced performance of the proposed framework is understood from the comparison with existing algorithms.

105 citations


Journal ArticleDOI
TL;DR: A comparison with state-of-the-art metrics shows that the proposed Perceptual Sharpness Index correlates highly with human perception and exhibits low computational complexity.
Abstract: In this letter, a no-reference perceptual sharpness metric based on a statistical analysis of local edge gradients is presented. The method takes properties of the human visual system into account. Based on perceptual properties, a relationship between the extracted statistical features and the metric score is established to form a Perceptual Sharpness Index (PSI). A comparison with state-of-the-art metrics shows that the proposed method correlates highly with human perception and exhibits low computational complexity. In contrast to existing metrics, the PSI performs well for a wide range of blurriness and shows a high degree of invariance for different image contents.

Journal ArticleDOI
TL;DR: A non-expanded block-based progressive visual secret sharing scheme with noise-like and meaningful shares with several advantages over other related methods, including one that is more suitable for grayscale and color secret images.

Journal ArticleDOI
TL;DR: In this paper, a patch-based retargeting scheme with an extended significance measurement is introduced to preserve shapes of both visually salient objects and structure lines while minimizing visual distortions.
Abstract: Image retargeting is the process of adapting images to fit displays with various aspect ratios and sizes. Most studies on image retargeting focus on shape preservation, but they do not fully consider the preservation of structure lines, which are sensitive to human visual system. In this paper, a patch-based retargeting scheme with an extended significance measurement is introduced to preserve shapes of both visually salient objects and structure lines while minimizing visual distortions. In the proposed scheme, a similarity transformation constraint is used to force visually salient content to undergo as-rigid-as-possible deformation, while an optimization process is performed to smoothly propagate distortions. These processes enable our approach to yield pleasing content-aware warping and retargeting. Experimental results and a user study show that our results are better than those generated by state-of-the-art approaches.

Journal ArticleDOI
TL;DR: An overview of perceptual based approaches for image enhancement, segmentation and coding, and a brief review of image quality assessment methods, which are used to evaluate the performance of visual information processing techniques.
Abstract: Perceptual approaches have been widely used in many areas of visual information processing. This paper presents an overview of perceptual based approaches for image enhancement, segmentation and coding. The paper also provides a brief review of image quality assessment (IQA) methods, which are used to evaluate the performance of visual information processing techniques. The intent of this paper is not to review all the relevant works that have appeared in the literature, but rather to focus on few topics that have been extensively researched and developed over the past few decades. The goal is to present a perspective as broad as possible on this actively evolving domain due to relevant advances in vision research and signal processing. Therefore, for each topic, we identify the main contributions of perceptual approaches and their limitations, and outline how perceptual vision has influenced current state-of-the-art techniques in image enhancement, segmentation, coding and visual information quality assessment.

Journal ArticleDOI
06 Aug 2013
TL;DR: A short history of advances in perceptual visual signal compression is presented, and perceptual models and how they are embedded into systems for compression and transmission are described, both with and without current compression standards.
Abstract: One- and two-way communication with digital compressed visual signals is now an integral part of the daily life of millions. Such commonplace use has been realized by decades of advances in visual signal compression. The design of effective, efficient compression and transmission strategies for visual signals may benefit from proper incorporation of human visual system (HVS) characteristics. This paper overviews psychophysics and engineering associated with the communication of visual signals. It presents a short history of advances in perceptual visual signal compression, and describes perceptual models and how they are embedded into systems for compression and transmission, both with and without current compression standards.

Journal ArticleDOI
Yong Ju Jung1, Hosik Sohn1, Seong-il Lee1, HyunWook Park1, Yong Man Ro1 
TL;DR: A new objective assessment method for visual discomfort of stereoscopic images that makes effective use of the human visual attention model and can achieve significantly higher prediction accuracy than the state-of-the-art methods.
Abstract: We introduce a new objective assessment method for visual discomfort of stereoscopic images that makes effective use of the human visual attention model. The proposed method takes into account visual importance regions that play an important role in determining the overall degree of visual discomfort of a stereoscopic image. After obtaining a saliency-based visual importance map for an image, perceptually significant disparity features are extracted to predict the overall degree of visual discomfort. Experimental results show that the proposed method can achieve significantly higher prediction accuracy than the state-of-the-art methods.

Journal ArticleDOI
TL;DR: A multi-scale image enhancement algorithm based on a new parametric contrast measure that incorporates not only the luminance masksing characteristic, but also the contrast masking characteristic of the human visual system is presented.
Abstract: Image enhancement is a crucial pre-processing step for various image processing applications and vision systems. Many enhancement algorithms have been proposed based on different sets of criteria. However, a direct multi-scale image enhancement algorithm capable of independently and/or simultaneously providing adequate contrast enhancement, tonal rendition, dynamic range compression, and accurate edge preservation in a controlled manner has yet to be produced. In this paper, a multi-scale image enhancement algorithm based on a new parametric contrast measure is presented. The parametric contrast measure incorporates not only the luminance masking characteristic, but also the contrast masking characteristic of the human visual system. The formulation of the contrast measure can be adapted for any multi-resolution decomposition scheme in order to yield new human visual system-inspired multi-scale transforms. In this article, it is exemplified using the Laplacian pyramid, discrete wavelet transform, stationary wavelet transform, and dual-tree complex wavelet transform. Consequently, the proposed enhancement procedure is developed. The advantages of the proposed method include: 1) the integration of both the luminance and contrast masking phenomena; 2) the extension of non-linear mapping schemes to human visual system inspired multi-scale contrast coefficients; 3) the extension of human visual system-based image enhancement approaches to the stationary and dual-tree complex wavelet transforms, and a direct means of; 4) adjusting overall brightness; and 5) achieving dynamic range compression for image enhancement within a direct multi-scale enhancement framework. Experimental results demonstrate the ability of the proposed algorithm to achieve simultaneous local and global enhancements.

Journal ArticleDOI
TL;DR: A new paradigm for controlled psychophysical studies of local natural image regularities is presented and discrimination performance was accurately predicted by model likelihood, an information theoretic measure of model efficacy, indicating that the visual system possesses a surprisingly detailed knowledge of natural image higher-order correlations, much more so than current image models.
Abstract: A key hypothesis in sensory system neuroscience is that sensory representations are adapted to the statistical regularities in sensory signals and thereby incorporate knowledge about the outside world. Supporting this hypothesis, several probabilistic models of local natural image regularities have been proposed that reproduce neural response properties. Although many such physiological links have been made, these models have not been linked directly to visual sensitivity. Previous psychophysical studies of sensitivity to natural image regularities focus on global perception of large images, but much less is known about sensitivity to local natural image regularities. We present a new paradigm for controlled psychophysical studies of local natural image regularities and compare how well such models capture perceptually relevant image content. To produce stimuli with precise statistics, we start with a set of patches cut from natural images and alter their content to generate a matched set whose joint statistics are equally likely under a probabilistic natural image model. The task is forced choice to discriminate natural patches from model patches. The results show that human observers can learn to discriminate the higher-order regularities in natural images from those of model samples after very few exposures and that no current model is perfect for patches as small as 5 by 5 pixels or larger. Discrimination performance was accurately predicted by model likelihood, an information theoretic measure of model efficacy, indicating that the visual system possesses a surprisingly detailed knowledge of natural image higher-order correlations, much more so than current image models. We also perform three cue identification experiments to interpret how model features correspond to perceptually relevant image features.

Journal ArticleDOI
Jin Sun1, Haibin Ling1
TL;DR: A new thumbnailing framework named scale and object aware thumbnails (SOAT), which contains two components focusing respectively on saliency measure and thumbnail warping/cropping, which demonstrated promising performances in comparison with state-of-the-art algorithms.
Abstract: In this paper we study effective approaches to create thumbnails from input images Since a thumbnail will eventually be presented to and perceived by a human visual system, a thumbnailing algorithm should consider several important issues in the process including thumbnail scale, object completeness and local structure smoothness To address these issues, we propose a new thumbnailing framework named scale and object aware thumbnailing (SOAT), which contains two components focusing respectively on saliency measure and thumbnail warping/cropping The first component, named scale and object aware saliency (SOAS), models the human perception of thumbnails using visual acuity theory, which takes thumbnail scale into consideration In addition, the "objectness" measurement (Alexe et al 2012) is integrated in SOAS, as to preserve object completeness The second component uses SOAS to guide the thumbnailing based on either retargeting or cropping The retargeting version uses the thin-plate-spline (TPS) warping for preserving structure smoothness An extended seam carving algorithm is developed to sample control points used for TPS model estimation The cropping version searches a cropping window that balances the spatial efficiency and SOAS-based content preservation The proposed algorithms were evaluated in three experiments: a quantitative user study to evaluate thumbnail browsing efficiency, a quantitative user study for subject preference, and a qualitative study on the RetargetMe dataset In all studies, SOAT demonstrated promising performances in comparison with state-of-the-art algorithms

Proceedings ArticleDOI
23 Jun 2013
TL;DR: Results show that the proposed framework for boundary detection in complex natural scenes has excellent ability to flexibly capture both the structured chromatic and achromatic boundaries in complex scenes.
Abstract: Color information plays an important role in better understanding of natural scenes by at least facilitating discriminating boundaries of objects or areas. In this study, we propose a new framework for boundary detection in complex natural scenes based on the color-opponent mechanisms of the visual system. The red-green and blue-yellow color opponent channels in the human visual system are regarded as the building blocks for various color perception tasks such as boundary detection. The proposed framework is a feed forward hierarchical model, which has direct counterpart to the color-opponent mechanisms involved in from the retina to the primary visual cortex (V1). Results show that our simple framework has excellent ability to flexibly capture both the structured chromatic and achromatic boundaries in complex scenes.

Journal ArticleDOI
TL;DR: The proposed pattern masking function is extended to just noticeable difference (JND) estimation and introduced a novel pixel domain JND model that is more consistent with the HVS than the existing JND models.
Abstract: A model of visual masking, which reveals the visibility of stimuli in the human visual system (HVS), is useful in perceptual based image/video processing. The existing visual masking function mainly considers luminance contrast, which always overestimates the visibility threshold of the edge region and underestimates that of the texture region. Recent research on visual perception indicates that the HVS is sensitive to orderly regions that possess regular structures and insensitive to disorderly regions that possess uncertain structures. Therefore, structural uncertainty is another determining factor on visual masking. In this paper, we introduce a novel pattern masking function based on both luminance contrast and structural uncertainty. Through mimicking the internal generative mechanism of the HVS, a prediction model is firstly employed to separate out the unpredictable uncertainty from an input image. In addition, an improved local binary pattern is introduced to compute the structural uncertainty. Finally, combining luminance contrast with structural uncertainty, the pattern masking function is deduced. Experimental result demonstrates that the proposed pattern masking function outperforms the existing visual masking function. Furthermore, we extend the pattern masking function to just noticeable difference (JND) estimation and introduce a novel pixel domain JND model. Subjective viewing test confirms that the proposed JND model is more consistent with the HVS than the existing JND models.

Journal ArticleDOI
TL;DR: This paper proposes an alternative perceptual video coding method to improve upon the current H.264/advanced video control (AVC) framework based on an independent JND-directed suppression tool and analytically derives a JND mapping formula between the integer DCT domain and the classic DCTdomain which permits us to reuse the JND models in a more natural way.
Abstract: The field of video coding has been exploring the compact representation of video data, where perceptual redundancies in addition to signal redundancies are removed for higher compression. Many research efforts have been dedicated to modeling the human visual system's characteristics. The resulting models have been integrated into video coding frameworks in different ways. Among them, coding enhancements with the just noticeable distortion (JND) model have drawn much attention in recent years due to its significant gains. A common application of the JND model is the adjustment of quantization by a multiplying factor corresponding to the JND threshold. In this paper, we propose an alternative perceptual video coding method to improve upon the current H.264/advanced video control (AVC) framework based on an independent JND-directed suppression tool. This new tool is capable of finely tuning the quantization using a JND-normalized error model. To make full use of this new rate distortion adjustment component the Lagrange multiplier for rate distortion optimization is derived in terms of the equivalent distortion. Because the H.264/AVC integer discrete cosine transform (DCT) is different from classic DCT, on which state-of-the-art JND models are computed, we analytically derive a JND mapping formula between the integer DCT domain and the classic DCT domain which permits us to reuse the JND models in a more natural way. In addition, the JND threshold can be refined by adopting a saliency algorithm in the coding framework and we reduce the complexity of the JND computation by reusing the motion estimation of the encoder. Another benefit of the proposed scheme is that it remains fully compliant with the existing H.264/AVC standard. Subjective experimental results show that significant bit saving can be obtained using our method while maintaining a similar visual quality to the traditional H.264/AVC coded video.

Journal ArticleDOI
TL;DR: A novel no-reference bitstream-based objective video quality metric is presented that is constructed by genetic programming-based symbolic regression and shows that perceived quality can be modeled with high accuracy using only parameters extracted from the received video bitstream.
Abstract: In order to ensure optimal quality of experience toward end users during video streaming, automatic video quality assessment becomes an important field-of-interest to video service providers. Objective video quality metrics try to estimate perceived quality with high accuracy and in an automated manner. In traditional approaches, these metrics model the complex properties of the human visual system. More recently, however, it has been shown that machine learning approaches can also yield competitive results. In this paper, we present a novel no-reference bitstream-based objective video quality metric that is constructed by genetic programming-based symbolic regression. A key benefit of this approach is that it calculates reliable white-box models that allow us to determine the importance of the parameters. Additionally, these models can provide human insight into the underlying principles of subjective video quality assessment. Numerical results show that perceived quality can be modeled with high accuracy using only parameters extracted from the received video bitstream.

Journal ArticleDOI
TL;DR: It is shown that where observers look can strongly modulate their reports of simple surface attributes, such as lightness, when observers matched the color of natural objects; at the same time, they tended to fixate points with above-average luminance.
Abstract: The variable resolution and limited processing capacity of the human visual system requires us to sample the world with eye movements and attentive processes. Here we show that where observers look can strongly modulate their reports of simple surface attributes, such as lightness. When observers matched the color of natural objects they based their judgments on the brightest parts of the objects; at the same time, they tended to fixate points with above-average luminance. When we forced participants to fixate a specific point on the object using a gaze-contingent display setup, the matched lightness was higher when observers fixated bright regions. This finding indicates a causal link between the luminance of the fixated region and the lightness match for the whole object. Simulations with rendered physical lighting show that higher values in an object’s luminance distribution are particularly informative about reflectance. This sampling strategy is an efficient and simple heuristic for the visual system to achieve accurate and invariant judgments of lightness.

Journal ArticleDOI
TL;DR: This paper illustrates and exemplifies the good practices to be followed in using machine learning in modeling perceptual mechanisms and proves the ability of ML-based approaches to address visual quality assessment.
Abstract: Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality assessment task from a different perspective, as the eventual goal is to mimic quality perception instead of designing an explicit model the human visual system. Several studies already proved the ability of ML-based approaches to address visual quality assessment; nevertheless, these paradigms are highly prone to overfitting, and their overall reliability may be questionable. In fact, a prerequisite for successfully using ML in modeling perceptual mechanisms is a profound understanding of the advantages and limitations that characterize learning machines. This paper illustrates and exemplifies the good practices to be followed.

Journal ArticleDOI
TL;DR: A novel watermarking scheme is proposed by embedding a binary watermark into gray-scale images using a hybrid GA-BPN intelligent network, which is robust against selected attacks and is well optimized.

Journal ArticleDOI
26 Jun 2013
TL;DR: An important mechanism of perception, visually selective attention, which is becoming more and more important for multimedia applications is introduced, and the concept of visual attention is introduced and its underlying mechanisms are described.
Abstract: Making technological advances in the field of human-machine interactions requires that the capabilities and limitations of the human perceptual system be taken into account. The focus of this report is an important mechanism of perception, visually selective attention, which is becoming more and more important for multimedia applications. We introduce the concept of visual attention and describe its underlying mechanisms. In particular, we introduce the concepts of overt and covert visual attention, and of bottom-up and top-down processing. Challenges related to modeling visual attention and their validation using ad hoc ground truth are also discussed. Examples of the usage of visual attention models in image and video processing are presented. We emphasize multimedia delivery, retargeting and quality assessment of image and video, medical imaging, and the field of stereoscopic 3-D image applications.

Proceedings ArticleDOI
28 Mar 2013
TL;DR: A new tone mapping technique for high dynamic range images based on the retinex theory is presented, which provides satisfactory results while preserving details and reducing halo artifacts.
Abstract: In this paper, we present a new tone mapping technique for high dynamic range images based on the retinex theory. Our algorithm consists of two steps, global adaptation and local adaptation of the human visual system. In the local adaptation process, the Gaussian filter of the retinex algorithms is substituted with a guided filter to reduce halo artifacts. To guarantee good rendition and dynamic range compression, we propose a contrast enhancement factor based on the luminance values of the scene. In addition, an adaptive nonlinearity offset is introduced to deal with the strength of the logarithm function's nonlinearity. Experiments show that our algorithm provides satisfactory results while preserving details and reducing halo artifacts.

Journal ArticleDOI
TL;DR: The objective is to carefully distinguish several types of studies related to human visual attention and saliency as a measure of attentiveness, and to provide a taxonomy from several viewpoints such as the main objective, the use of additional cues and mathematical principles.
Abstract: SUMMARY We humans are easily able to instantaneously detect the regions in a visual scene that are most likely to contain something of interest. Exploiting this pre-selection mechanism called visual attention for image and video processing systems would make them more sophisticated and therefore more useful. This paper briefly describes various computational models of human visual attention and their development, as well as related psychophysical findings. In particular, our objective is to carefully distinguish several types of studies related to human visual attention and saliency as a measure of attentiveness, and to provide a taxonomy from several viewpoints such as the main objective, the use of additional cues and mathematical principles. This survey finally discusses possible future directions for research into human visual attention and saliency computation.

Journal ArticleDOI
TL;DR: A new spatio-temporal saliency detection framework on the basis of regularized feature reconstruction is proposed, which achieves the best performance over several state-of-the-art approaches.
Abstract: Multimedia applications such as image or video retrieval, copy detection, and so forth can benefit from saliency detection, which is essentially a method to identify areas in images and videos that capture the attention of the human visual system. In this paper, we propose a new spatio-temporal saliency detection framework on the basis of regularized feature reconstruction. Specifically, for video saliency detection, both the temporal and spatial saliency detection are considered. For temporal saliency, we model the movement of the target patch as a reconstruction process using the patches in neighboring frames. A Laplacian smoothing term is introduced to model the coherent motion trajectories. With psychological findings that abrupt stimulus could cause a rapid and involuntary deployment of attention, our temporal model combines the reconstruction error, regularizer, and local trajectory contrast to measure the temporal saliency. For spatial saliency, a similar sparse reconstruction process is adopted to capture the regions with high center-surround contrast. Finally, the temporal saliency and spatial saliency are combined together to favor salient regions with high confidence for video saliency detection. We also apply the spatial saliency part of the spatio-temporal model to image saliency detection. Experimental results on a human fixation video dataset and an image saliency detection dataset show that our method achieves the best performance over several state-of-the-art approaches.