scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2009"


Book
21 Apr 2009
TL;DR: This book is the first comprehensive introduction to the multidisciplinary field of natural image statistics and explains both the basic theory and the most recent advances in a coherent and user-friendly manner.
Abstract: One of the most successful frameworks in computational neuroscience is modeling visual processing using the statistical structure of natural images. In this framework, the visual system of the brain constructs a model of the statistical regularities of the incoming visual data. This enables the visual system to perform efficient probabilistic inference. The same framework is also very useful in engineering applications such as image processing and computer vision. This book is the first comprehensive introduction to the multidisciplinary field of natural image statistics. The book starts with a review of background material in signal processing and neuroscience, which makes it accessible to a wide audience. The book then explains both the basic theory and the most recent advances in a coherent and user-friendly manner. This structure, together with the included exercises and computer assignments, also make it an excellent textbook. "Natural Image Statistics" is a timely and valuable resource for advanced students and researchers in any discipline related to vision, such as neuroscience, computer science, psychology, electrical engineering, cognitive science or statistics.

384 citations


Journal ArticleDOI
TL;DR: This paper proposes a perceptual quality evaluation method for image fusion which is based on human visual system (HVS) models and finds the algorithm provides better predictions, which are more closely matched to human perceptual evaluations, than the existing algorithms.

314 citations


Journal ArticleDOI
TL;DR: A DCT based JND model for monochrome pictures is proposed that incorporates the spatial contrast sensitivity function (CSF), the luminance adaptation effect, and the contrast masking effect based on block classification and is consistent with the human visual system.
Abstract: In image and video processing field, an effective compression algorithm should remove not only the statistical redundancy information but also the perceptually insignificant component from the pictures. Just-noticeable distortion (JND) profile is an efficient model to represent those perceptual redundancies. Human eyes are usually not sensitive to the distortion below the JND threshold. In this paper, a DCT based JND model for monochrome pictures is proposed. This model incorporates the spatial contrast sensitivity function (CSF), the luminance adaptation effect, and the contrast masking effect based on block classification. Gamma correction is also considered to compensate the original luminance adaptation effect which gives more accurate results. In order to extend the proposed JND profile to video images, the temporal modulation factor is included by incorporating the temporal CSF and the eye movement compensation. Moreover, a psychophysical experiment was designed to parameterize the proposed model. Experimental results show that the proposed model is consistent with the human visual system (HVS). Compared with the other JND profiles, the proposed model can tolerate more distortion and has much better perceptual quality. This model can be easily applied in many related areas, such as compression, watermarking, error protection, perceptual distortion metric, and so on.

257 citations


Journal ArticleDOI
TL;DR: A novel framework for IQA to mimic the human visual system (HVS) by incorporating the merits from multiscale geometric analysis (MGA), contrast sensitivity function (CSF), and the Weber's law of just noticeable difference (JND) is developed.
Abstract: Reduced-reference (RR) image quality assessment (IQA) has been recognized as an effective and efficient way to predict the visual quality of distorted images. The current standard is the wavelet-domain natural image statistics model (WNISM), which applies the Kullback-Leibler divergence between the marginal distributions of wavelet coefficients of the reference and distorted images to measure the image distortion. However, WNISM fails to consider the statistical correlations of wavelet coefficients in different subbands and the visual response characteristics of the mammalian cortical simple cells. In addition, wavelet transforms are optimal greedy approximations to extract singularity structures, so they fail to explicitly extract the image geometric information, e.g., lines and curves. Finally, wavelet coefficients are dense for smooth image edge contours. In this paper, to target the aforementioned problems in IQA, we develop a novel framework for IQA to mimic the human visual system (HVS) by incorporating the merits from multiscale geometric analysis (MGA), contrast sensitivity function (CSF), and the Weber's law of just noticeable difference (JND). In the proposed framework, MGA is utilized to decompose images and then extract features to mimic the multichannel structure of HVS. Additionally, MGA offers a series of transforms including wavelet, curvelet, bandelet, contourlet, wavelet-based contourlet transform (WBCT), and hybrid wavelets and directional filter banks (HWD), and different transforms capture different types of image geometric information. CSF is applied to weight coefficients obtained by MGA to simulate the appearance of images to observers by taking into account many of the nonlinearities inherent in HVS. JND is finally introduced to produce a noticeable variation in sensory experience. Thorough empirical studies are carried out upon the LIVE database against subjective mean opinion score (MOS) and demonstrate that 1) the proposed framework has good consistency with subjective perception values and the objective assessment results can well reflect the visual quality of images, 2) different transforms in MGA under the new framework perform better than the standard WNISM and some of them even perform better than the standard full-reference IQA model, i.e., the mean structural similarity index, and 3) HWD performs best among all transforms in MGA under the framework.

251 citations


Proceedings ArticleDOI
19 Oct 2009
TL;DR: DVWs and DVPs are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs, and are more comparable with the text words than the classic visual words.
Abstract: The Bag-of-visual Words (BoW) image representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the words in texts. However, massive experiments show that the commonly used visual words are not as expressive as the text words, which is not desirable because it hinders their effectiveness in various applications. In this paper, Descriptive Visual Words (DVWs) and Descriptive Visual Phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, novel descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs from classic visual words for various applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain scenes or objects are identified as the DVWs and DVPs. Experiments show that the DVWs and DVPs are compact and descriptive, thus are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including image retrieval, image re-ranking, and object recognition. The DVW and DVP combination outperforms the classic visual words by 19.5% and 80% in image retrieval and object recognition tasks, respectively. The DVW and DVP based image re-ranking algorithm: DWPRank outperforms the state-of-the-art VisualRank by 12.4% in accuracy and about 11 times faster in efficiency.

245 citations


Journal ArticleDOI
TL;DR: It is shown that center-surround patterns emerge as the optimal solution for predicting saccade targets from their local image structure, and bottom-up visual saliency may not be computed cortically as has been thought previously.
Abstract: The human visual system is foveated, that is, outside the central visual field resolution and acuity drop rapidly.Nonetheless much of a visual scene is perceived after only a few saccadic eye movements, suggesting an effectivestrategy for selecting saccade targets. It has been known for some time that local image structure at saccade targetsinfluences the selection process. However, the question of what the most relevant visual features are is still under debate.Here we show that center-surround patterns emerge as the optimal solution for predicting saccade targets from their localimage structure. The resulting model, a one-layer feed-forward network, is surprisingly simple compared to previouslysuggested models which assume much more complex computations such as multi-scale processing and multiple featurechannels. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiologicalhardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thoughtpreviously.Keywords: visual saliency, eye movements, receptive field analysis, classification images, kernel methods,support vector machines, natural scenesCitation: Kienzle, W., Franz, M. O., Scholkopf, B., & Wichmann, F. A. (2009). Center-surround patterns emerge as optimalpredictors for human saccade targets. Journal of Vision, 9(5):7, 1–15, http://journalofvision.org/9/5/7/, doi:10.1167/9.5.7.

233 citations


Journal ArticleDOI
TL;DR: A spatio-temporal saliency model that predicts eye movement during video free viewing inspired by the biology of the first steps of the human visual system is presented.
Abstract: This paper presents a spatio-temporal saliency model that predicts eye movement during video free viewing. This model is inspired by the biology of the first steps of the human visual system. The model extracts two signals from video stream corresponding to the two main outputs of the retina: parvocellular and magnocellular. Then, both signals are split into elementary feature maps by cortical-like filters. These feature maps are used to form two saliency maps: a static and a dynamic one. These maps are then fused into a spatio-temporal saliency map. The model is evaluated by comparing the salient areas of each frame predicted by the spatio-temporal saliency map to the eye positions of different subjects during a free video viewing experiment with a large database (17000 frames). In parallel, the static and the dynamic pathways are analyzed to understand what is more or less salient and for what type of videos our model is a good or a poor predictor of eye movement.

233 citations


Journal ArticleDOI
TL;DR: Video cameras have a single temporal limit set by the frame rate, but the human visual system has multiple temporal limits set by its various constituent mechanisms, which seem to form two groups that collaborate to create the unified visual experience.

195 citations


Proceedings ArticleDOI
29 Jul 2009
TL;DR: It is shown that the proposed metric results in a very good correlation with subjective scores especially for images with varying foreground and background perceived blur qualities, and with a significantly lower computational complexity as compared to existing methods that take into account the visual attention information.
Abstract: In this paper, a no-reference objective sharpness metric based on a cumulative probability of blur detection is proposed. The metric is evaluated by taking into account the Human Visual System (HVS) response to blur distortions. The perceptual significance of the metric is validated through subjective experiments. It is shown that the proposed metric results in a very good correlation with subjective scores especially for images with varying foreground and background perceived blur qualities. This is accomplished with a significantly lower computational complexity as compared to existing methods that take into account the visual attention information.

181 citations


Journal ArticleDOI
TL;DR: The authors tested the hypothesis that differences between the memory of a stimulus array and the perception of a new array are detected in a manner that is analogous to the detection of simple features in visual search tasks.
Abstract: The human visual system can notice differences between memories of previous visual inputs and perceptions of new visual inputs, but the comparison process that detects these differences has not been well characterized. In this study, the authors tested the hypothesis that differences between the memory of a stimulus array and the perception of a new array are detected in a manner that is analogous to the detection of simple features in visual search tasks. That is, just as the presence of a task-relevant feature in visual search can be detected in parallel, triggering a rapid shift of attention to the object containing the feature, the presence of a memory‐percept difference along a task-relevant dimension can be detected in parallel, triggering a rapid shift of attention to the changed object. Supporting evidence was obtained in a series of experiments in which manual reaction times, saccadic reaction times, and event-related potential latencies were examined. However, these experiments also showed that a slow, limited-capacity process must occur before the observer can make a manual change detection response.

161 citations


Journal ArticleDOI
TL;DR: It is found that V1, V2, and V3 are separable right into the center of the foveal confluence, andV1 ends as a rounded wedge with an affine mapping of thefoveal singularity, indicating that more neuronal processing power is dedicated to second-level analysis in this small but important part of the visual field.
Abstract: The human visual system devotes a significant proportion of its resources to a very small part of the visual field, the fovea. Foveal vision is crucial for natural behavior and many tasks in daily life such as reading or fine motor control. Despite its significant size, this part of cortex is rarely investigated and the limited data have resulted in competing models of the layout of the foveal confluence in primate species. Specifically, how V2 and V3 converge at the central fovea is the subject of debate in primates and has remained “terra incognita” in humans. Using high-resolution fMRI (1.2 × 1.2 × 1.2 mm3) and carefully designed visual stimuli, we sought to accurately map the human foveal confluence and hence disambiguate the competing theories. We find that V1, V2, and V3 are separable right into the center of the foveal confluence, and V1 ends as a rounded wedge with an affine mapping of the foveal singularity. The adjacent V2 and, in contrast to current concepts from macaque monkey, also V3 maps form continuous bands (∼5 mm wide) around the tip of V1. This mapping results in a highly anisotropic representation of the visual field in these areas. Unexpectedly, for the centermost 0.75°, the cortical representations for both V2 and V3 are larger than that of V1, indicating that more neuronal processing power is dedicated to second-level analysis in this small but important part of the visual field.

Journal ArticleDOI
TL;DR: The present work shows that rats possess more advanced visual abilities than previously appreciated and provides the first systematic evidence for invariant object recognition in rats, and argues for an increased focus on rodents as models for studying high-level visual processing.
Abstract: The human visual system is able to recognize objects despite tremendous variation in their appearance on the retina resulting from variation in view, size, lighting, etc. This ability—known as “invariant” object recognition—is central to visual perception, yet its computational underpinnings are poorly understood. Traditionally, nonhuman primates have been the animal model-of-choice for investigating the neuronal substrates of invariant recognition, because their visual systems closely mirror our own. Meanwhile, simpler and more accessible animal models such as rodents have been largely overlooked as possible models of higher-level visual functions, because their brains are often assumed to lack advanced visual processing machinery. As a result, little is known about rodents' ability to process complex visual stimuli in the face of real-world image variation. In the present work, we show that rats possess more advanced visual abilities than previously appreciated. Specifically, we trained pigmented rats to perform a visual task that required them to recognize objects despite substantial variation in their appearance, due to changes in size, view, and lighting. Critically, rats were able to spontaneously generalize to previously unseen transformations of learned objects. These results provide the first systematic evidence for invariant object recognition in rats and argue for an increased focus on rodents as models for studying high-level visual processing.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work presents a novel algorithm that finds this bounding contour and achieves the segmentation of one object, given the fixation, in a cue independent manner and evaluates the performance of the proposed algorithm on challenging videos and stereo pairs.
Abstract: The human visual system observes and understands a scene/image by making a series of fixations. Every “fixation point” lies inside a particular region of arbitrary shape and size in the scene which can either be an object or just a part of it. We define as a basic segmentation problem the task of segmenting that region containing the “fixation point”. Segmenting this region is equivalent to finding the enclosing contour - a connected set of boundary edge fragments in the edge map of the scene - around the fixation. We present here a novel algorithm that finds this bounding contour and achieves the segmentation of one object, given the fixation. The proposed segmentation framework combines monocular cues (color/intensity/texture) with stereo and/or motion, in a cue independent manner. We evaluate the performance of the proposed algorithm on challenging videos and stereo pairs. Although the proposed algorithm is more suitable for an active observer capable of fixating at different locations in the scene, it applies to a single image as well. In fact, we show that even with monocular cues alone, the introduced algorithm performs as well or better than a number of image segmentation algorithms, when applied to challenging inputs.

Journal ArticleDOI
TL;DR: Experimental results for the six kinds of cluttered background images show that the proposed TMSCR produces less false alarms compared to the Top-hat method at the same detection rate.
Abstract: Robust detection of small targets is very important in IRST (Infrared Search and Track) This paper presents a novel mathematical method for the incoming target detection problem in cluttered background motivated from the robust properties of human visual system (HVS) The HVS shows the best efficiency and robustness for an object detection task The robust properties of the HVS are contrast mechanism, multi-resolution representation, size adaptation, and pop-out phenomena Based on these facts, a plausible computational model integrating these facts is proposed using Laplacian scale-space theory and Tune-Max based optimization method Simultaneous target signal enhancement and background clutter suppression is achieved by tuning and maximizing the signal-to-clutter ratio (TMSCR) in Laplacian scale-space At the first stage, the Tune-Max of the signal to background contrast produces candidate targets with adapted scale At the second stage, the Tune-Max of the signal-to-clutter ratio (SCR) produces maximal SCR which is used to pop-out detections Experimental evaluation results for the incoming target sequence validate the upgraded detection capability of the proposed method compared with the Top-hat method at the same false alarm rate Experimental results for the six kinds of cluttered background images show that the proposed TMSCR produces less false alarms (43 times reduction) compared to the Top-hat at the same detection rate

Journal ArticleDOI
TL;DR: It is shown that very small thumbnail images at the spatial resolution of 32 × 32 color pixels provide enough information to identify the semantic category of real-world scenes and permit observers to report four to five of the objects that the scene contains, despite the fact that some of these objects are unrecognizable in isolation.
Abstract: The human visual system is remarkably tolerant to degradation in image resolution: human performance in scene categorization remains high no matter whether low-resolution images or multimegapixel images are used. This observation raises the question of how many pixels are required to form a meaningful representation of an image and identify the objects it contains. In this article, we show that very small thumbnail images at the spatial resolution of 32 × 32 color pixels provide enough information to identify the semantic category of real-world scenes. Most strikingly, this low resolution permits observers to report, with 80% accuracy, four to five of the objects that the scene contains, despite the fact that some of these objects are unrecognizable in isolation. The robustness of the information available at very low resolution for describing semantic content of natural images could be an important asset to explain the speed and efficiently at which the human brain comprehends the gist of visual scenes.

Journal ArticleDOI
TL;DR: Local color transfer algorithms are proposed to resolve themixing up of colors in different regions and the fidelity problem in Reinhard et al.'s pioneering work.
Abstract: Color transfer is an image processing technique which can produce a new image combining one source image's contents with another image's color style. While being able to produce convincing results, however, Reinhard et al.'s pioneering work has two problems—mixing up of colors in different regions and the fidelity problem. Many local color transfer algorithms have been proposed to resolve the first problem, but the second problem was paid few attentions. In this paper, a novel color transfer algorithm is presented to resolve the fidelity problem of color transfer in terms of scene details and colors. It's well known that human visual system is more sensitive to local intensity differences than to intensity itself. We thus consider that preserving the color gradient is necessary for scene fidelity. We formulate the color transfer problem as an optimization problem and solve it in two steps—histogram matching and a gradient-preserving optimization. Following the idea of the fidelity in terms of color and gradient, we also propose a metric for objectively evaluating the performance of example-based color transfer algorithms. The experimental results show the validity and high fidelity of our algorithm and that it can be used to deal with local color transfer.

Journal ArticleDOI
TL;DR: The design of reduced-reference objective perceptual image quality metrics for use in wireless imaging and their excellent correlation with human perception in terms of accuracy, monotonicity, and consistency are focused on.
Abstract: The rapid growth of third and development of future generation mobile systems has led to an increase in the demand for image and video services. However, the hostile nature of the wireless channel makes the deployment of such services much more challenging, as in the case of a wireline system. In this context, the importance of taking care of user satisfaction with service provisioning as a whole has been recognized. The related user-oriented quality concepts cover end-to-end quality of service and subjective factors such as experiences with the service. To monitor quality and adapt system resources, performance indicators that represent service integrity have to be selected and related to objective measures that correlate well with the quality as perceived by humans. Such objective perceptual quality metrics can then be utilized to optimize quality perception associated with applications in technical systems. In this paper, we focus on the design of reduced-reference objective perceptual image quality metrics for use in wireless imaging. Specifically, the normalized hybrid image quality metric (NHIQM) and a perceptual relevance weighted L"p-norm are designed. The main idea behind both feature-based metrics relates to the fact that the human visual system (HVS) is trained to extract structural information from the viewing area. Accordingly, NHIQM and L"p-norm are designed to account for different structural artifacts that have been observed in our distortion model of a wireless link. The extent by which individual artifacts are present in a given image is obtained by measuring related image features. The overall quality measure is then computed as a weighting sum of the features with the respective perceptual relevance weight obtained from subjective experiments. The proposed metrics differ mainly in the pooling of the features and amount of reduced-reference produced. While NHIQM performs the pooling at the transmitter of the system to produce a single value as reduced-reference, the L"p-norm requires all involved feature values from the transmitted and received image to perform the pooling on the feature differences at the receiver. In addition, non-linear mapping functions are developed that relate the metric values to predicted mean opinion scores (MOS) and account for saturations in the HVS. The evaluation of prediction performance of NHIQM and the L"p-norm reveals their excellent correlation with human perception in terms of accuracy, monotonicity, and consistency. This holds not only for the prediction performance on images taken for the training of the metrics but also for the generalization to unknown images. In addition, it is shown that the NHIQM approach and the perceptual relevance weighted L"p-norm outperform other prominent objective quality metrics in prediction performance.

Journal ArticleDOI
TL;DR: This paper seeks to advance visualization methods by proposing a framework for human 'higher cognition' that extends more familiar perceptual models and suggests guidelines for the development of visual interfaces that better integrate complementary capabilities of humans and computers.
Abstract: It is well known that visual analytics addresses the difficulty of evaluating and processing large quantities of information. Less often discussed are the increasingly complex analytic and reasoning processes that must be applied in order to accomplish that goal. Success of the visual analytics approach will require us to develop new visualization models that predict how computational processes might facilitate human insight and guide the flow of human reasoning. In this paper, we seek to advance visualization methods by proposing a framework for human 'higher cognition' that extends more familiar perceptual models. Based on this approach, we suggest guidelines for the development of visual interfaces that better integrate complementary capabilities of humans and computers. Although many of these recommendations are novel, some can be found in existing visual analytics applications. In the latter case, much of the value of our contribution lies in the deeper rationale that the model provides for those principles. Lastly, we assess these visual analytics guidelines through the evaluation of several visualization examples.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method to enhance the image quality for a given backlight intensity by performing brightness compensation and local contrast enhancement, where global image statistics and backlight level are considered to maintain the overall brightness of the image.
Abstract: One common way to extend the battery life of a portable device is to reduce the LCD backlight intensity. In contrast to previous approaches that minimize the power consumption by adjusting the backlight intensity frame by frame to reach a specified image quality, the proposed method optimizes the image quality for a given backlight intensity. Image is enhanced by performing brightness compensation and local contrast enhancement. For brightness compensation, global image statistics and backlight level are considered to maintain the overall brightness of the image. For contrast enhancement, the local contrast property of human visual system (HVS) is exploited to enhance the local image details. In addition, a brightness prediction scheme is proposed to speed up the algorithm for display of video sequences. Experimental results are presented to show the performance of the algorithm.

Journal ArticleDOI
03 Feb 2009
TL;DR: An algorithm for estimating the roughness of a 3D mesh, as a local measure of geometric noise on the surface is introduced, which is based on curvature analysis on local windows of the mesh and is independent of the resolution/connectivity of the object.
Abstract: 3D models are subject to a wide variety of processing operations such as compression, simplification or watermarking, which may introduce some geometric artifacts on the shape. The main issue is to maximize the compression/simplification ratio or the watermark strength while minimizing these visual degradations. However few algorithms exploit the human visual system to hide these degradations, while perceptual attributes could be quite relevant for this task. Particularly, the masking effect defines the fact that one visual pattern can hide the visibility of another. In this context we introduce an algorithm for estimating the roughness of a 3D mesh, as a local measure of geometric noise on the surface. Indeed, a textured (or rough) region is able to hide geometric distortions much better than a smooth one. Our measure is based on curvature analysis on local windows of the mesh and is independent of the resolution/connectivity of the object. The accuracy and the robustness of our measure, together with its relevance regarding visual masking have been demonstrated through extensive comparisons with state-of-the-art and subjective experiment. Two applications are also presented, in which the roughness is used to lead (and improve) respectively compression and watermarking algorithms.

Book ChapterDOI
Yin Li1, Yue Zhou1, Junchi Yan1, Zhibin Niu1, Jie Yang1 
23 Sep 2009
TL;DR: A novel visual saliency detection method - the conditional saliency for both image and video, approximate the conditional entropy by the lossy coding length of multivariate Gaussian data and indicates a robust and reliable feature invariance saliency.
Abstract: By the guidance of attention, human visual system is able to locate objects of interest in complex scene. In this paper, we propose a novel visual saliency detection method - the conditional saliency for both image and video. Inspired by biological vision, the definition of visual saliency follows a strictly local approach. Given the surrounding area, the saliency is defined as the minimum uncertainty of the local region, namely the minimum conditional entropy, when the perceptional distortion is considered. To simplify the problem, we approximate the conditional entropy by the lossy coding length of multivariate Gaussian data. The final saliency map is accumulated by pixels and further segmented to detect the proto-objects. Experiments are conducted on both image and video. And the results indicate a robust and reliable feature invariance saliency.

Journal ArticleDOI
TL;DR: The experimental results demonstrate the superiority of the proposed reversible visible watermarking scheme compared to the existing methods, and adopts data compression for further reduction in the recovery packet size and improvement in embedding capacity.
Abstract: A reversible (also called lossless, distortion-free, or invertible) visible watermarking scheme is proposed to satisfy the applications, in which the visible watermark is expected to combat copyright piracy but can be removed to losslessly recover the original image. We transparently reveal the watermark image by overlapping it on a user-specified region of the host image through adaptively adjusting the pixel values beneath the watermark, depending on the human visual system-based scaling factors. In order to achieve reversibility, a reconstruction/recovery packet, which is utilized to restore the watermarked area, is reversibly inserted into non-visibly-watermarked region. The packet is established according to the difference image between the original image and its approximate version instead of its visibly watermarked version so as to alleviate its overhead. For the generation of the approximation, we develop a simple prediction technique that makes use of the unaltered neighboring pixels as auxiliary information. The recovery packet is uniquely encoded before hiding so that the original watermark pattern can be reconstructed based on the encoded packet. In this way, the image recovery process is carried out without needing the availability of the watermark. In addition, our method adopts data compression for further reduction in the recovery packet size and improvement in embedding capacity. The experimental results demonstrate the superiority of the proposed scheme compared to the existing methods.

Journal Article
TL;DR: In this article, a mathematical method for the incoming target detection problem in cluttered background motivated from the robust properties of human visual system (HVS) is presented, which shows the best efficiency and robustness for an object detection task.
Abstract: Robust detection of small targets is very important in IRST (Infrared Search and Track). This paper presents a novel mathematical method for the incoming target detection problem in cluttered background motivated from the robust properties of human visual system (HVS). The HVS shows the best efficiency and robustness for an object detection task. The robust properties of the HVS are contrast mechanism, multi-resolution representation, size adaptation, and pop-out phenomena. Based on these facts, a plausible computational model integrating these facts is proposed using Laplacian scale-space theory and Tune-Max based optimization method. Simultaneous target signal enhancement and background clutter suppression is achieved by tuning and maximizing the signal-to-clutter ratio (TMSCR) in Laplacian scale-space. At the first stage, the Tune-Max of the signal to background contrast produces candidate targets with adapted scale. At the second stage, the Tune-Max of the signal-to-clutter ratio (SCR) produces maximal SCR which is used to pop-out detections. Experimental evaluation results for the incoming target sequence validate the upgraded detection capability of the proposed method compared with the Top-hat method at the same false alarm rate. Experimental results for the six kinds of cluttered background images show that the proposed TMSCR produces less false alarms (4.3 times reduction) compared to the Top-hat at the same detection rate.

BookDOI
25 May 2009
TL;DR: The wide-ranging volume offers an overview into cutting-edge research into the newest tensor processing techniques and their application to different domains related to computer vision and image processing.
Abstract: Tensor signal processing is an emerging field with important applications to computer vision and image processing. This book presents the state of the art in this new branch of signal processing, offering a great deal of research and discussions by leading experts in the area. The wide-ranging volume offers an overview into cutting-edge research into the newest tensor processing techniques and their application to different domains related to computer vision and image processing. This comprehensive text will prove to be an invaluable reference and resource for researchers, practitioners and advanced students working in the area of computer vision and image processing.

Journal ArticleDOI
01 Dec 2009
TL;DR: A novel reduced reference IQA scheme is developed by incorporating the merits from the contourlet transform, contrast sensitivity function (CSF), and Weber's law of just noticeable difference (JND) to produce a noticeable variation in sensory experience.
Abstract: The human visual system (HVS) provides a suitable cue for image quality assessment (IQA). In this paper, we develop a novel reduced reference (RR) IQA scheme by incorporating the merits from the contourlet transform, contrast sensitivity function (CSF), and Weber's law of just noticeable difference (JND). In this scheme, the contourlet transform is utilized to decompose images and then extract features to mimic the multichannel structure of HVS. CSF is applied to weight coefficients obtained by the contourlet transform to simulate the appearance of images to observers by taking into account many of the nonlinearities inherent in HVS. JND is finally introduced to produce a noticeable variation in sensory experience. Thorough empirical studies are carried out upon the laboratory for image and video engineering database against the subjective mean opinion score and demonstrate that the proposed framework has good consistency with subjective perception values and the objective assessment results can well reflect the visual quality of images.

Journal ArticleDOI
TL;DR: It is shown that the human visual system adaptively switches between 1D pooling and 2D Pooling depending on the input, which exhibits great flexibility when estimating complex optic flows in natural scenes.
Abstract: The two-dimensional (2D) trajectory of visual motion is usually not directly available to the visual system. Local one-dimensional (1D) sensors initiate processing but can only restrict the solution to a set of speed and direction combinations consistent with the 2D trajectory. These 1D signals are then integrated across orientation and space to compute 2D signals. Both motion integrations are thought to occur in higher cortical areas, but it remains unclear whether 1D signals are integrated over orientation and space simultaneously (1D pooling process), or instead are integrated locally with the resulting 2D signals then spatially integrated (2D pooling process). From psychophysical responses to novel global-motion stimuli comprised of numerous Gabor (1D) or Plaid (2D) elements, here we show that the human visual system adaptively switches between 1D pooling and 2D pooling depending on the input. When local 2D signals cannot be determined, the visual system shows effective 1D pooling that approximately follows the intersection of constraints rule. On the other hand, when local 2D signals are available, the visual system shows 2D pooling that approximately follows the vector average rule. Spatial motion integration therefore exhibits great flexibility when estimating complex optic flows in natural scenes.

Patent
07 Aug 2009
TL;DR: In this paper, a visual prosthesis codes visual signals into electrical stimulation patterns for the creation of artificial vision using image compression techniques, temporal coding strategies, continuous interleaved sampling (CIS), and/or radar or sonar data.
Abstract: A visual prostheses codes visual signals into electrical stimulation patterns for the creation of artificial vision. In some examples, coding of the information uses image compression techniques, temporal coding strategies, continuous interleaved sampling (CIS), and/or radar or sonar data. Examples of the approach are not limited to processing visual signals but can also be used to processing signals at other frequency ranges (e.g., infrared, radio frequency, and ultrasound), for instance, creating an augmented visual sensation.

Journal ArticleDOI
TL;DR: A novel no-reference blockiness metric that provides a quantitative measure of blocking annoyance in block-based DCT coding is presented and shows to be highly consistent with subjective data at a reduced computational load.
Abstract: A novel no-reference blockiness metric that provides a quantitative measure of blocking annoyance in block-based DCT coding is presented. The metric incorporates properties of the human visual system (HVS) to improve its reliability, while the additional cost introduced by the HVS is minimized to ensure its use for real-time processing. This is mainly achieved by calculating the local pixel-based distortion of the artifact itself, combined with its local visibility by means of a simplified model of visual masking. The overall computation efficiency and metric accuracy is further improved by including a grid detector to identify the exact location of blocking artifacts in a given image. The metric calculated only at the detected blocking artifacts is averaged over all blocking artifacts in the image to yield an overall blockiness score. The performance of this metric is compared to existing alternatives in literature and shows to be highly consistent with subjective data at a reduced computational load. As such, the proposed blockiness metric is promising in terms of both computational efficiency and practical reliability for real-life applications.

Proceedings ArticleDOI
Yin Li1, Yue Zhou1, Lei Xu1, Xiaochao Yang1, Jie Yang1 
07 Nov 2009
TL;DR: A new visual saliency detection model for both image and video is proposed, Inspired by biological vision, where saliency is defined locally and lossy compression is adopted.
Abstract: By the guidance of attention, human visual system is able to locate objects of interest in complex scene. We propose a new visual saliency detection model for both image and video. Inspired by biological vision, saliency is defined locally. Lossy compression is adopted, where the saliency of a location is measured by the Incremental Coding Length(ICL). The ICL is computed by presenting the center patch as the sparsest linear representation of its surroundings. The final saliency map is generated by accumulating the coding length. The model is tested on both images and videos. The results indicate a reliable and robust saliency of our method.

Journal ArticleDOI
TL;DR: A no-reference technique that uses the multi neural channels aspect of human visual system to quantify visual impairment by altering the outputs of these sensory channels independently using statistical ''standard score'' formula in the Fourier domain is presented.