scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 2003"


Proceedings ArticleDOI
09 Nov 2003
TL;DR: This paper proposes a multiscale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions, and develops an image synthesis method to calibrate the parameters that define the relative importance of different scales.
Abstract: The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multiscale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.

4,333 citations


Journal ArticleDOI
TL;DR: This work proposes a new theoretical setting based on the mathematical framework of hierarchical Bayesian inference for reasoning about the visual system, and suggests that the algorithms of particle filtering and Bayesian-belief propagation might model these interactive cortical computations.
Abstract: Traditional views of visual processing suggest that early visual neurons in areas V1 and V2 are static spatiotemporal filters that extract local features from a visual scene. The extracted information is then channeled through a feedforward chain of modules in successively higher visual areas for further analysis. Recent electrophysiological recordings from early visual neurons in awake behaving monkeys reveal that there are many levels of complexity in the information processing of the early visual cortex, as seen in the long-latency responses of its neurons. These new findings suggest that activity in the early visual cortex is tightly coupled and highly interactive with the rest of the visual system. They lead us to propose a new theoretical setting based on the mathematical framework of hierarchical Bayesian inference for reasoning about the visual system. In this framework, the recurrent feedforward/feedback loops in the cortex serve to integrate top-down contextual priors and bottom-up observations so as to implement concurrent probabilistic inference along the visual hierarchy. We suggest that the algorithms of particle filtering and Bayesian-belief propagation might model these interactive cortical computations. We review some recent neurophysiological evidences that support the plausibility of these ideas.

1,431 citations


Proceedings Article
01 Dec 2003
TL;DR: This paper proposes a multi-scale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions, and develops an image synthesis method to calibrate the parameters that define the relative importance of different scales.
Abstract: The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multi-scale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.

1,205 citations


Journal ArticleDOI
TL;DR: Three methods for visual cryptography of gray-level and color images based on past studies in black-and-white visual cryptography, the halftone technology, and the color decomposition method are proposed.

463 citations


Journal ArticleDOI
TL;DR: Two new mechanisms which direct visual attention in the proposed object-based visual attention system are object-driven as well as feature-driven, and the first mechanism computes the visual salience of objects and groupings and implements the hierarchical selectivity of attentional shifts.

333 citations


Journal ArticleDOI
TL;DR: A new algorithm for digital images unsupervised enhancement with simultaneous global and local effects, called ACE for Automatic Color Equalization, based on a computational model of the human visual system that merges the two basic "Gray World" and "White Patch" global equalization mechanisms.

323 citations


Journal ArticleDOI
TL;DR: The Efficient Coding Hypothesis, which holds that the purpose of early visual processing is to produce an efficient representation of the incoming visual signal, provides a quantitative link between the statistical properties of the world and the structure of the visual system.

300 citations


Journal ArticleDOI
TL;DR: It is shown that the statistics of low-level features across the whole image can be used to prime the presence or absence of objects in the scene and to predict their location, scale, and appearance before exploring the image.
Abstract: Models of visual attention have focused predominantly on bottom-up approaches that ignored structured contextual and scene information. I propose a model of contextual cueing for attention guidance based on the global scene configuration. It is shown that the statistics of low-level features across the whole image can be used to prime the presence or absence of objects in the scene and to predict their location, scale, and appearance before exploring the image. In this scheme, visual context information can become available early in the visual processing chain, which allows modulation of the saliency of image regions and provides an efficient shortcut for object detection and recognition.

277 citations


Journal ArticleDOI
TL;DR: A (k, n)-threshold visual cryptography scheme is proposed to encode a secret image into n shadow images, where any k or more of them can visually recover the secret image, but any k - 1 or fewer of them gain no information about it.

268 citations


Journal ArticleDOI
TL;DR: Advances in Bayesian models of computer vision and in the measurement and modeling of natural image statistics are providing the tools to test and constrain theories of human object perception, which are having an impact on the interpretation of cortical function.

247 citations


Journal ArticleDOI
TL;DR: A foveation scalable video coding (FSVC) algorithm which supplies good quality-compression performance as well as effective rate scalability, and is adaptable to different applications, such as knowledge-based video coding and video communications over time-varying, multiuser and interactive networks.
Abstract: Image and video coding is an optimization problem. A successful image and video coding algorithm delivers a good tradeoff between visual quality and other coding performance measures, such as compression, complexity, scalability, robustness, and security. In this paper, we follow two recent trends in image and video coding research. One is to incorporate human visual system (HVS) models to improve the current state-of-the-art of image and video coding algorithms by better exploiting the properties of the intended receiver. The other is to design rate scalable image and video codecs, which allow the extraction of coded visual information at continuously varying bit rates from a single compressed bitstream. Specifically, we propose a foveation scalable video coding (FSVC) algorithm which supplies good quality-compression performance as well as effective rate scalability. The key idea is to organize the encoded bitstream to provide the best decoded video at an arbitrary bit rate in terms of foveated visual quality measurement. A foveation-based HVS model plays an important role in the algorithm. The algorithm is adaptable to different applications, such as knowledge-based video coding and video communications over time-varying, multiuser and interactive networks.

Journal ArticleDOI
TL;DR: Progress in the development of flexible, generative models that can explain visual input as a combination of hidden variables and can adapt to new types of input are reviewed.

Journal ArticleDOI
TL;DR: A new method for quantitatively assessing the plausibility of this model of visual attention by comparing its performance with human behavior is proposed, which can easily be compared by qualitative and quantitative methods.
Abstract: Visual attention is the ability of the human vision system to detect salient parts of the scene, on which higher vision tasks, such as recognition, can focus. In human vision, it is believed that visual attention is intimately linked to the eye movements and that the fixation points correspond to the location of the salient scene parts. In computer vision, the paradigm of visual attention has been widely investigated and a saliency-based model of visual attention is now available that is commonly accepted and used in the field, despite the fact that its biological grounding has not been fully assessed. This work proposes a new method for quantitatively assessing the plausibility of this model by comparing its performance with human behavior. The basic idea is to compare the map of attention - the saliency map - produced by the computational model with a fixation density map derived from eye movement experiments. This human attention map can be constructed as an integral of single impulses located at the positions of the successive fixation points. The resulting map has the same format as the computer-generated map, and can easily be compared by qualitative and quantitative methods. Some illustrative examples using a set of natural and synthetic color images show the potential of the validation method to assess the plausibility of the attention model.

Journal ArticleDOI
TL;DR: By analyzing the time course of reaction times in a masked natural scene categorization paradigm, it is shown that the human visual system can generate selective motor responses based on a single feed-forward pass and feedback loops do not appear to be mandatory for visual processing.
Abstract: The ventral visual pathway implements object recognition and categorization in a hierarchy of processing areas with neuronal selectivities of increasing complexity. The presence of massive feedback connections within this hierarchy raises the possibility that normal visual processing relies on the use of computational loops. It is not known, however, whether object recognition can be performed at all without such loops (i.e., in a purely feed-forward mode). By analyzing the time course of reaction times in a masked natural scene categorization paradigm, we show that the human visual system can generate selective motor responses based on a single feed-forward pass. We confirm these results using a more constrained letter discrimination task, in which the rapid succession of a target and mask is actually perceived as a distractor. We show that a masked stimulus presented for only 26 msec - and often not consciously perceived - can fully determine the earliest selective motor responses: The neural representations of the stimulus and mask are thus kept separated during a short period corresponding to the feed-forward "sweep." Therefore, feedback loops do not appear to be "mandatory" for visual processing. Rather, we found that such loops allow the masked stimulus to reverberate in the visual system and affect behavior for nearly 150 msec after the feed-forward sweep.

Proceedings ArticleDOI
24 Nov 2003
TL;DR: A new method to evaluate the quality if distorted images based on a comparison between the structural information extracted from the distorted image and from the original image, which is highly correlated with human judgments (mean opinion score).
Abstract: This paper presents a new method to evaluate the quality if distorted images. This method is based on a comparison between the structural information extracted from the distorted image and from the original image. The interest of our method is that it uses reduced references containing perceptual structural information. First, a quick overview of image quality evaluation methods is given. Then the implementation of our human visual system (HVS) model is detailed. At last, results are given for quality evaluation of JPEG and JPEG2000 coded images. They show that our method provides results which are highly correlated with human judgments (mean opinion score). This method has been implemented in an application available on the Internet.

Proceedings ArticleDOI
18 Dec 2003
TL;DR: The proposed color correction method is based on ACE model, an unsupervised color equalization algorithm, a perceptual approach inspired by some adaptation mechanisms of the human visual system, in particular lightness constancy and color constancy.
Abstract: We present in this paper some advances in color restoration of underwater images, especially with regard to the strong and non uniform color cast which is typical of underwater images. The proposed color correction method is based on ACE model, an unsupervised color equalization algorithm. ACE is a perceptual approach inspired by some adaptation mechanisms of the human visual system, in particular lightness constancy and color constancy. A perceptual approach presents a lot of advantages: it is unsupervised, robust and has local filtering properties, that lead to more effective results. The restored images give better results when displayed or processed (fish segmentation and feature extraction). The presented preliminary results are satisfying and promising.

Proceedings ArticleDOI
25 Jun 2003
TL;DR: This paper demonstrates how properties of the human visual system, in particular inattentional blindness, can be exploited to accelerate the rendering of animated sequences by applying a priori knowledge of a viewer's task focus.
Abstract: The perceived quality of computer graphics imagery depends on the accuracy of the rendered frames, as well as the capabilities of the human visual system. Fully detailed, high fidelity frames still take many minutes even hours to render on today's computers. The human eye is physically incapable of capturing a moving scene in full detail. We sense image detail only in a 2° foveal region, relying on rapid eye movements, or saccades, to jump between points of interest. Our brain then reassembles these glimpses into a coherent, but inevitably imperfect, visual percept of the environment. In the process, we literally lose sight of the unimportant details. In this paper, we demonstrate how properties of the human visual system, in particular inattentional blindness, can be exploited to accelerate the rendering of animated sequences by applying a priori knowledge of a viewer's task focus. We show in a controlled experimental setting how human subjects will consistently fail to notice degradations in the quality of image details unrelated to their assigned task, even when these details fall under the viewers' gaze. We then build on these observations to create a perceptual rendering framework that combines predetermined task maps with spatiotemporal contrast sensitivity to guide a progressive animation system which takes full advantage of image-based rendering techniques. We demonstrate this framework with a Radiance ray-tracing implementation that completes its work in a fraction of the normally required time, with few noticeable artifacts for viewers performing the task.

Journal ArticleDOI
TL;DR: By psychophysical measurements, this finding establishes a new role for color vision in determining the three-dimensional structure of an image: one that exploits the natural relationships that exist between color and luminance in the visual world.
Abstract: In natural scenes, chromatic variations, and the luminance variations that are aligned with them, mainly arise from surfaces such as flowers or painted objects. Pure or near-pure luminance variations, on the other hand, mainly arise from inhomogeneous illumination such as shadows or shading. Here, I provide evidence that knowledge of these color-luminance relationships is built into the machinery of the human visual system. When a pure-luminance grating is added to a differently oriented chromatic grating, the resulting 'plaid' appears to spring into three-dimensional relief, an example of 'shape-from-shading'. By psychophysical measurements, I found that the perception of shape-from-shading in the plaid was triggered when the chromatic and luminance gratings were not aligned, and suppressed when the gratings were aligned. This finding establishes a new role for color vision in determining the three-dimensional structure of an image: one that exploits the natural relationships that exist between color and luminance in the visual world.

Book
30 Sep 2003
TL;DR: Front-End Vision and Multi-Scale Image Analysis as discussed by the authors is a tutorial in multi-scale methods for computer vision and image processing, which is written in Mathematica, a high-level language for symbolic and numerical manipulations.
Abstract: Front-End Vision and Multi-Scale Image Analysis is a tutorial in multi-scale methods for computer vision and image processing. It builds on the cross fertilization between human visual perception and multi-scale computer vision (`scale-space') theory and applications. The multi-scale strategies recognized in the first stages of the human visual system are carefully examined, and taken as inspiration for the many geometric methods discussed. All chapters are written in Mathematica, a spectacular high-level language for symbolic and numerical manipulations. The book presents a new and effective approach to quickly mastering the mathematics of computer vision and image analysis. The typically short code is given for every topic discussed, and invites the reader to spend many fascinating hours `playing' with computer vision. Front-End Vision and Multi-Scale Image Analysis is intended for undergraduate and graduate students, and all with an interest in computer vision, medical imaging, and human visual perception.

Patent
15 Dec 2003
TL;DR: In this paper, a method of interfacing used on a network having a central computer system and a plurality of remote computer systems is provided, where each remote computer system includes a video display.
Abstract: A method of interfacing used on a network having a central computer system and a plurality of remote computer system is provided. Each remote computer system includes a video display. The method includes the steps of creating a first visual representation of a first user on the visual display of the first computer system and a second visual representation of a second user on the visual display of the second computer system. The second visual representation is then displayed on the visual display of the first computer system and the first visual representation is displayed on the video display of the second computer system. Applied to video games, the method creates a first visual representation of a first player on a first remote computer system, identifies an interest and a skill level of the first player for at least one video game, indicates predetermined personal characteristics of the first player, saves the visual representation, interest, skill levels and personal characteristics of the first player, accesses the central computer system from the first remote computer system over telephone lines, selects a second player who has accessed the central computer system from a second remote computer system, and invites the second player to play a selected video game. The step of inviting allows the second player to access the visual representation, interest, skill levels and personal characteristics of the first player. The method of interacting is used on a network having a central computer system and a plurality of remote computer systems. Each remote computer system is operated by a user and has access to at least one predetermined application program. The method allows an application program to be employed by at least two primary users and then allows a different user to watch the action of the predetermined application program as it is employed by the primary users. The computer network includes a central computer system, a plurality of remote computer systems connected to the central computer system over telephone lines, means for creating visual representations of users on the visual displays of the remote computer systems, means for sending the visual representation of a user from one remote computer system to a predetermined number of other remote computer systems, and means for running an application program between users of different remote computer systems.

Patent
11 Jul 2003
TL;DR: In this paper, a system and method for perceptual processing, organization, categorization, recognition, and manipulation of visual images and visual elements is presented, which utilizes a dynamic perceptual organization schema to adaptively drive image-processing sub-algorithms.
Abstract: A system and method for perceptual processing, organization, categorization, recognition, and manipulation of visual images and visual elements. The sysstem utilizes a dynamic perceptual organization schema to adaptively drive image-processing sub-algorithms. The schema incorporates knowledge about the visual world, human perception and image categories within its structure. A fuzzy logic query control system integrates the knowledge base and image processing drivers.

Journal ArticleDOI
TL;DR: A novel watermarking scheme to ensure the authenticity of digital images using characteristics of the human visual system to maximize the embedding weights while keeping good perceptual transparency and an image-dependent method to evaluate the optimal quantization step allowing the tamper proofing of the image.

01 Jan 2003
TL;DR: A new kind of image representation in terms of local multinmodal Primitives that are motivated by processing of the human visual system as well as by functional considerations is described.
Abstract: We describe a new kind of image representation in terms of local multinmodal Primitives. These Primitives are motivated by processing of the human visual system as well as by functional considerations. We discuss analogies of our representation to human vision and concentrate specically on the implications of the necessity of communication of information in a complex multi-modal system.

01 Jan 2003
TL;DR: A statistical method is proposed that does not distinguish between the auditory and visual data, but one that operates on a fused data set that finds audio/visual features that correspond to events depicted in the stream.
Abstract: This paper presents a methodology for extracting meaningful audio/visual features from video streams. We propose a statistical method that does not distinguish between the auditory and visual data, but one that operates on a fused data set. By doing so we discover audio/visual features that correspond to events depicted in the stream. Using these features, we can obtain a segmentation of the input video stream by separating independent auditory and visual events. ICA 2003

Journal ArticleDOI
TL;DR: This approach leads to enhanced computational efficiency by interpreting nonuniform-density foveated images on the uniform domain and by using a foveation protocol between the encoder and the decoder.
Abstract: This paper explores the problem of communicating high-quality, foveated video streams in real time. Foveated video exploits the nonuniform resolution of the human visual system by preferentially allocating bits according to the proximity to assumed visual fixation points, thus delivering perceptually high quality at greatly reduced bandwidths. Foveated video streams possess specific data density properties that can be exploited to enhance the efficiency of subsequent video processing. Here, we exploit these properties to construct several efficient foveated video processing algorithms: foveation filtering (local bandwidth reduction), motion estimation, motion compensation, video rate control, and video postprocessing. Our approach leads to enhanced computational efficiency by interpreting nonuniform-density foveated images on the uniform domain and by using a foveation protocol between the encoder and the decoder.

Journal ArticleDOI
TL;DR: The objects that the subjects consistently failed to report elicited a significant negative priming effect when presented in a subsequent task, suggesting that their identity was represented in high-level cortical areas of the visual system, before the corresponding neural activity was suppressed during attentional selection.
Abstract: When a visual scene, containing many discrete objects, is presented to our retinae, only a subset of these objects will be explicitly represented in visual awareness. The number of objects accessing short-term visual memory might be even smaller. Finally, it is not known to what extent “ignored” objects (those that do not enter visual awareness) will be processed –or recognized. By combining free recall, forced-choice recognition and visual priming paradigms for the same natural visual scenes and subjects, we were able to estimate these numbers, and provide insights as to the fate of objects that are not explicitly recognized in a single fixation. When presented for 250 ms with a scene containing 10 distinct objects, human observers can remember up to 4 objects with full confidence, and between 2 and 3 more when forced to guess. Importantly, the objects that the subjects consistently failed to report elicited a significant negative priming effect when presented in a subsequent task, suggesting that their identity was represented in high-level cortical areas of the visual system, before the corresponding neural activity was suppressed during attentional selection. These results shed light on neural mechanisms of attentional competition, and representational capacity at different levels of the human visual system.

Proceedings ArticleDOI
01 Jan 2003
TL;DR: A novel algorithm is presented for embedding a binary image as a watermark into the dc components of the DCT coefficients of 8/spl times/8 blocks that incorporates the feature of texture masking and luminance masking of the human visual system into watermarking.
Abstract: In this paper, we present a novel algorithm for embedding a watermark into a still host image in the DCT domain. Unlike the traditional techniques, we embed a binary image as a watermark into the dc components of the DCT coefficients of 8/spl times/8 blocks. In the proposed algorithm, we incorporate the feature of texture masking and luminance masking of the human visual system into watermarking. The algorithm recovers the watermark without any reference to the original image. Experimental results demonstrate that the visible watermarks embedded with the proposed algorithm are robust to various attacks.

Proceedings ArticleDOI
18 Jun 2003
TL;DR: The proposed model is used as the underlying framework in which a system for detecting and recognizing road signs is developed, and consists of three major components: sensory, perceptual, and conceptual components.
Abstract: We propose a computational model motivated by human cognitive processes for detecting changes of driving environments. The model, called dynamic visual model, consists of three major components: sensory, perceptual, and conceptual components. The proposed model is used as the underlying framework in which a system for detecting and recognizing road signs is developed.

Journal ArticleDOI
TL;DR: A neural model of texture processing which integrates the data obtained by a variety of methods into a common computational framework and enables to link human performance in texture segmentation with model cell activation patterns, in turn permitting to trace back fundamental psychophysical results on texture processing to their putative neural origins.

Journal ArticleDOI
TL;DR: The physical approach to color constancy offered in the paper confirms relational colorconstancy as a first step in color constant vision systems and raises the question of whether the illuminant is estimated at all in pre-attentive vision.