scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 1997"


Proceedings ArticleDOI
03 Aug 1997
TL;DR: This work discusses how this work is applicable in many areas of computer graphics involving digitized photographs, including image-based modeling, image compositing, and image processing, and demonstrates a few applications of having high dynamic range radiance maps.
Abstract: We present a method of recovering high dynamic range radiance maps from photographs taken with conventional imaging equipment. In our method, multiple photographs of the scene are taken with different amounts of exposure. Our algorithm uses these differently exposed photographs to recover the response function of the imaging process, up to factor of scale, using the assumption of reciprocity. With the known response function, the algorithm can fuse the multiple photographs into a single, high dynamic range radiance map whose pixel values are proportional to the true radiance values in the scene. We demonstrate our method on images acquired with both photochemical and digital imaging processes. We discuss how this work is applicable in many areas of computer graphics involving digitized photographs, including image-based modeling, image compositing, and image processing. Lastly, we demonstrate a few applications of having high dynamic range radiance maps, such as synthesizing realistic motion blur and simulating the response of the human visual system.

2,967 citations


Journal ArticleDOI
TL;DR: A demonstration of ACT-R's application to menu selection is discussed and it is shown that theACT-R theory makes unique predictions, without estimating any parameters, about the time to search a menu.
Abstract: The ACT-R system is a general system for modeling a wide range of higher level cognitive processes. Recently, it has been embellished with a theory of how its higher level processes interact with a visual interface. This includes a theory of how visual attention can move across the screen, encoding information into a form that can be processed by ACT-R. This system is applied to modeling several classic phenomena in the literature that depend on the speed and selectivity with which visual attention can move across a visual display. ACT-R is capable of interacting with the same computer screens that subjects do and, as such, is well suited to provide a model for tasks involving human-computer interaction. In this article, we discuss a demonstration of ACT-R's application to menu selection and show that the ACT-R theory makes unique predictions, without estimating any parameters, about the time to search a menu. These predictions are confirmed.

488 citations


Proceedings ArticleDOI
26 Oct 1997
TL;DR: A new watermarking technique to add a code to digital images is presented; the method operates in the frequency domain embedding a pseudo-random sequence of real numbers in a selected set of DCT coefficients.
Abstract: Digital watermarking has been proposed as a viable solution to the need of copyright protection and authentication of multimedia data in a networked environment, since it makes it possible to identify the author, owner, distributor or authorized consumer of a document In this paper a new watermarking technique to add a code to digital images is presented; the method operates in the frequency domain embedding a pseudo-random sequence of real numbers in a selected set of DCT coefficients Watermark casting is performed by exploiting the masking characteristics of the human visual system, to ensure watermark invisibility The embedded sequence is extracted without resorting to the original image, so that the proposed technique represents a major improvement to methods relying on the comparison between the watermarked and original images Experimental results demonstrate that the watermark is robust to most of the signal processing techniques and geometric distortions

430 citations


Proceedings ArticleDOI
26 Oct 1997
TL;DR: An approach for still image watermarking is presented in which the watermark embedding process employs multiresolution fusion techniques and incorporates a model of the human visual system (HVS) to extract a watermark.
Abstract: We present an approach for still image watermarking in which the watermark embedding process employs multiresolution fusion techniques and incorporates a model of the human visual system (HVS). The original unmarked image is required to extract the watermark. Simulation results demonstrate the high robustness of the algorithm to such image degradations as JPEG compression, additive noise and linear filtering.

325 citations


Patent
09 May 1997
TL;DR: In this article, a computer implemented method of image generation that employs adaptive, progressive, perception-based spatio-temporal importance sampling to reduce the cost for image generation is presented.
Abstract: A computer implemented method of image generation that employs adaptive, progressive, perception-based spatio-temporal importance sampling to reduce the cost of image generation. The method uses an adaptive approach to sampling which employs refinement criteria based on specific spatio-temporal limits of human vision. By using these refinement criteria the method produces an image with a spatio-temporal structure that is closely matched to the spatio-temporal limits of human vision. Using one sampling criteria the spatial sampling density is adjusted in proportion to the sampled region's exposure duration in a manner that substantially reflects the visual systems's increase in acuity as a function of exposure time. Using other criteria the spatial and temporal sampling frequencies of a region of the image stream are adjusted based on the measured or estimated retinal velocity of the sampled element in a manner that substantially reflects both dynamic visual acuity limits of vision and the critical temporal sampling frequency for the perception of smooth motion. The method includes image parallel, shared-memory multiprocessor implementations based on sample reprojection and primitive reprojection, the latter using a technique of adaptive rasterization. In these implementations the natural temporal image coherence and temporal visibility coherence of the image stream produce a temporal locality of data reference that enhances performance of the system. Because temporal image coherence and temporal visibility coherence increase the spatial resolving performance of the human visual system, the performance of the present method parallels the performance of human vision making performance degradations relatively invisible to the user.

278 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined the effect of visual and audio modality effects from a cognitive load perspective in three experiments using geometry instruction and found that the additional processing capacity provided in an audio/visual format would only enhance learning if mental resources were not devoted to extensive visual based search in order to coordinate auditory and visual information.
Abstract: Advances in our knowledge of the structure of working memory suggest that under some circumstances, effectively more processing capacity is available to learners if instructional materials use multiple information modes (e.g. auditory and visual) instead of equivalent single mode formats. This paper examined this modality effect from a cognitive load perspective in three experiments using geometry instruction. In accordance with cognitive load theory, it was predicted that the additional processing capacity provided in an audio/visual format would only enhance learning if mental resources were not devoted to extensive visual based search in order to coordinate auditory and visual information. Using two different areas of geometry, Experiments 1 and 2 found that if visual search was clearly high, then audio‐visual instruction was only beneficial if visual indicators in the form of electronic flashing were incorporated into the instructional format. Under high search conditions, a standard audio/visual form...

274 citations


Proceedings ArticleDOI
03 Aug 1997
TL;DR: A computational model of visual masking based on psychophysical data is developed that allows us to choose texture patterns for computer graphics images that hide the effects of faceting, banding, aliasing, noise and other visual artifacts produced by sources of error in graphics algorithms.
Abstract: In this paper we develop a computational model of visual masking based on psychophysical data. The model predicts how the presence of one visual pattern affects the detectability of another. The model allows us to choose texture patterns for computer graphics images that hide the effects of faceting, banding, aliasing, noise and other visual artifacts produced by sources of error in graphics algorithms. We demonstrate the utility of the model by choosing a texture pattern to mask faceting artifacts caused by polygonal tesselation of a flat-shaded curved surface. The model predicts how changes in the contrast, spatial frequency, and orientation of the texture pattern, or changes in the tesselation of the surface will alter the masking effect. The model is general and has uses in geometric modeling, realistic image synthesis, scientific visualization, image compression, and image-based rendering. CR Categories: I.3.0 [Computer Graphics]: General;

236 citations


Journal ArticleDOI
TL;DR: By examining the effects of several factors on symmetry detection, this research has revealed some important characteristics of how humans perceive symmetry, which constrain the general principles of putative underlying mechanisms and models of human symmetry detection.

218 citations



Journal ArticleDOI
TL;DR: The results support the hypothesis that the human visual system incorporates a stationary light-source constraint in the perceptual processing of spatial layout of scenes.
Abstract: Phenomenally strong visual illusions are described in which the motion of an object's cast shadow determines the perceived 3-D trajectory of the object. Simply adjusting the motion of a shadow is sufficient to induce dramatically different apparent trajectories of the object casting the shadow. Psychophysical results obtained with the use of 3-D graphics are reported which show that: (i) the information provided by the motion of an object's shadow overrides other strong sources of information and perceptual biases, such as the assumption of constant object size and a general viewpoint; (ii) the natural constraint of shadow darkness plays a role in the interpretation of a moving image patch as a shadow, but under some conditions even unnatural light shadows can induce apparent motion in depth of an object; (iii) when shadow motion is caused by a moving light source, the visual system incorrectly interprets the shadow motion as consistent with a moving object, rather than a moving light source. The results support the hypothesis that the human visual system incorporates a stationary light-source constraint in the perceptual processing of spatial layout of scenes.

170 citations


Patent
02 Dec 1997
TL;DR: In this article, a system, method, and article of manufacture for displaying visual primitives of a transaction flow through a transaction processing system is described, where a visual representation of the transaction flow containing visual primitive is accessed from a storage device by a digital computer.
Abstract: A system, method, and article of manufacture for displaying visual primitives of a transaction flow through a transaction processing system. A visual representation of a transaction flow containing visual primitives is accessed from a storage device by a digital computer. The digital computer is then used to display the visual primitives of the transaction flow an a visual display in a flexible manner. The visual primitives can be dynamically sized and may display properties of each visual primitive. Configuration information associated with the transaction flow is also shown on the visual display.

Journal ArticleDOI
TL;DR: In this paper, three hierarchical multiresolution image fusion techniques are implemented and tested using image data from the Airborne Visual/Infrared Imaging Spectrometer (AVIRIS) hyperspectral sensor.
Abstract: Three hierarchical multiresolution image fusion techniques are implemented and tested using image data from the Airborne Visual/Infrared Imaging Spectrometer (AVIRIS) hyperspectral sensor. The methods presented focus on combining multiple images from the AVIRIS sensor into a smaller subset of images white maintaining the visual information necessary for human analysis. Two of the techniques are published algorithms that were originally designed to combine images from multiple sensors, but are shown to work well on multiple images from the same sensor. The third method presented was developed specifically to fuse hyperspectral images for visual analysis. This new method uses the spatial frequency response (contrast sensitivity) of the human visual system to determine which features in the input images need to be preserved in the composite image(s) thus ensuring the composite image maintains the visually relevant features from each input image. The image fusion algorithms are analyzed using test images with known image characteristics and image data from the AVIRIS hyperspectral sensor. After analyzing the signal-to-noise ratios and visual aesthetics of the fused images, contrast sensitivity based fusion is shown to provide excellent fusion results and, in every case, outperformed the other two methods.

Proceedings ArticleDOI
G.W. Braudaway1
26 Oct 1997
TL;DR: The method presented exploits the not well understood but superb ability of the human visual system to recognize a correlated pattern in a scatter diagram called a "visualizer-coincidence image."
Abstract: A method is presented for marking high-quality digital images with a robust and invisible watermark. A broad definition of robustness, stated as fundamental, is used. It requires the invisible mark to survive and remain detectable through all image manipulations that in themselves does not damage the image beyond useability. These manipulations include JPEG "lossy" compression and, in the extreme, the printing and rescanning of the image. The watermark is imparted onto an image as a random, bur reproducible, small modulation of its pixel brightnesses, and becomes a permanent part of the marked image. Detecting the imparted watermark, especially after image manipulation, is a daunting task. It is one of detecting the presence of a known small modulation of a random carrier where the carrier is composed of the pixel brightness values of the unmarked image. The method presented exploits the not well understood but superb ability of the human visual system to recognize a correlated pattern in a scatter diagram called a "visualizer-coincidence image." Results of application of the method are presented.

Proceedings ArticleDOI
23 Jun 1997
TL;DR: This work proposes a watermarking technique for digital images that is based on utilizing visual models which have been developed in the context of image compression, and is shown to provide very good results both in terms of image transparency and robustness.
Abstract: Content providers on the Internet are faced with the problem of how to secure electronic data. This problem has generated research activity in the area of digital watermarking of electronic content. The challenge is to introduce a digital watermark that is both transparent and highly robust to common signal processing and possible attacks. The two basic requirements for an effective watermarking scheme, robustness and transparency, conflict with each other. We propose a watermarking technique for digital images that is based on utilizing visual models which have been developed in the context of image compression. The visual models give us a direct way to determine the maximum amount of watermark signal that each portion of an image can tolerate without affecting the visual quality of the image. This allows us to provide the maximum strength watermark which in turn, is extremely robust to common image processing and editing such as JPEG compression, rescaling, and cropping. Our watermarking scheme is based on a DCT framework which allows for the possibility of directly watermarking the JPEG bitstream. Our scheme is shown to provide very good results both in terms of image transparency and robustness.

Book
01 Dec 1997
TL;DR: Computer Vision and Image Processing brings together the theory of computer imaging with the tools needed for practical research and development, and is a solid introduction for anyone who uses computer imaging.
Abstract: From the Publisher: True computer imaging for engineers! Digital signal processing has long been the domain of electrical engineers, while the manipulation of image data has been handled by computer scientists. The convergence of these two specialties in the field of Computer Vision and Image Processing (CVIP) is the subject of this pragmatic book, written from an applications perspective and accompanied by its own educational and developmentsoftware environment, CVIPtools. Illustrated with hundreds of examples, Computer Vision and Image Processing brings together the theory of computer imaging with the tools needed for practical research and development. The first part of Computer Vision and Image Processing presents a system model for each of the major application areas of CVIP, relating each specific algorithm to the overall process of applications development. The areas covered are: Image analysis Image restoration Image enhancement Image compression Computer Vision and Image Processing's second half focuses on the use of the CVIPtools environment, the software developed especially by the author and included on the accompanying CD-ROM. These advanced chapters discuss: Software features and applications CVIPtools software development environment Library descriptions and function prototypes CVIPtools is a GUI-based application, which includes an extended Tcl shell, that is ANSI-C compatible and runs on most flavors of UNIX and Windows NT/95. To get the most out of Computer Vision and Image Processing, a basic background in mathematics and computers is necessary. Knowledge of the C programming language will enhance the usefulness of the algorithms used in programming, and an understanding of signal and system theory is helpful in mastering transforms and compression. Engineers, programmers, graphics specialists, multimedia developers, and medical imaging professionals will all appreciate Computer Vision and Image Processing's solid introduction for anyone who uses computer imaging.

Journal ArticleDOI
TL;DR: Bottom-up hierarchical processing among visual cortical areas has been revealed in experiments that have correlated brain activations with human perceptual experience and top-down modulation of activity within visual cortical Areas has been demonstrated through studies of higher cognitive processes such as attention and memory.

Journal ArticleDOI
TL;DR: This paper formalizes the problem of 3-D structure and motion reconstruction as the estimation of the state of certain nonlinear dynamical models and studies the feasibility of ‘structure from motion’ by analyzing the observability of such models.

Journal ArticleDOI
Thrasyvoulos N. Pappas1
TL;DR: This work proposes a specific printer model that accounts for overlap between neighboring dots of ink and the spectral absorption properties of the inks, and shows that when a simple "one-minus-RGB" relationship between the red, green, and blue image specification and the corresponding cyan, magenta, and yellow inks is assumed, the algorithms are separated.
Abstract: We present a new class of models for color printers. They form the basis for model-based techniques that exploit the characteristics of the printer and the human visual system to maximize the quality of the printed images. We present two model-based techniques, the modified error diffusion (MED) algorithm and the least-squares model-based (LSMB) algorithm. Both techniques are extensions of the gray-scale model-based techniques and produce images with high spatial resolution and visually pleasant textures. We also examine the use of printer models for designing blue-noise screens. The printer models cam account for a variety of printer characteristics. We propose a specific printer model that accounts for overlap between neighboring dots of ink and the spectral absorption properties of the inks. We show that when we assume a simple "one-minus-RGB" relationship between the red, green, and blue image specification and the corresponding cyan, magenta, and yellow inks, the algorithms are separable. Otherwise, the algorithms are not separable and the modified error diffusion may be unstable, The experimental results consider the separable algorithms that produce high-quality images for applications where the exact colorimetric reproduction of color is not necessary. They are computationally simple and robust to errors in color registration, but the colors are device dependent.

Proceedings ArticleDOI
26 Oct 1997
TL;DR: This work demonstrates how the DBS algorithm exploits the model for the HVS to efficiently yield very high quality halftones.
Abstract: The direct binary search (DBS) algorithm is an iterative method which minimizes a metric of error between the grayscale original and halftone image. This is accomplished by adjusting an initial halftone until a local minimum of the metric is achieved at each pixel. The metric incorporates a model for the human visual system (HVS). In general, the DBS time complexity and halftone quality depend on three factors: the HVS model parameters, the choice of initial halftone, and the search strategy used to update the halftone. Despite the complexity of the DBS algorithm, it can be implemented with surprising efficiency. We demonstrate how the algorithm exploits the model for the HVS to efficiently yield very high quality halftones.

Journal ArticleDOI
TL;DR: It is shown that the LIP model is a powerful and tractable framework for handling the contrast notion through a survey of several LIP-model-based contrast estimators associated with special subparts of intensity images, that are justified both from a physical and mathematical point of view.
Abstract: The logarithmic image processing (LIP) model is a mathematical framework based on abstract linear mathematics which provides a set of specific algebraic and functional operations that can be applied to the processing of intensity images valued in a bounded range. The LIP model has been proved to be physically justified in the setting of transmitted light and to be consistent with several laws and characteristics of the human visual system. Successful application examples have also been reported in several image processing areas, e.g., image enhancement, image restoration, three-dimensional image reconstruction, edge detection and image segmentation. The aim of this article is to show that the LIP model is a tractable mathematical framework for image processing which is consistent with several laws and characteristics of human brightness perception. This is a survey article in the sense that it presents (almost) previously published results in a revised, refined and self-contained form. First, an introduction to the LIP model is exposed. Emphasis will be especially placed on the initial motivation and goal, and on the scope of the model. Then, an introductory summary of mathematical fundamentals of the LIP model is detailed. Next, the article aims at surveying the connections of the LIP model with several laws and characteristics of human brightness perception, namely the brightness scale inversion, saturation characteristic, Weber‘s and Fechner‘s laws, and the psychophysical contrast notion. Finally, it is shown that the LIP model is a powerful and tractable framework for handling the contrast notion. This is done through a survey of several LIP-model-based contrast estimators associated with special subparts (point, pair of points, boundary, region) of intensity images, that are justified both from a physical and mathematical point of view.

Proceedings ArticleDOI
03 Jun 1997
TL;DR: In this article, an image similarity metric for content-based image database search is proposed based on a multiscale model of the human visual system, which includes channels which account for perceptual phenomena such as color, contrast, color contrast and orientation selectivity.
Abstract: In this paper we present an image similarity metric for content-based image database search. The similarity metric is based on a multiscale model of the human visual system. This multiscale model includes channels which account for perceptual phenomena such as color, contrast, color-contrast and orientation selectivity. From these channels, we extract features and then form an aggregate measure of similarity using a weighted linear combination of the feature differences. The choice of features and weights is made to maximize the consistency with similarity ratings made by human subjects. In particular, we use a visual test to collect experimental image matching data. We then define a cost function relating the distances computed by the metric to the choices made by the human subject. The results indicate that features corresponding to contrast, color-contrast and orientation can significantly improve search performance. Furthermore, the systematic optimization and evaluation strategy using the visual test is a general tool for designing and evaluating image similarity metrics.

Journal ArticleDOI
TL;DR: This paper proposes the use of a wavelet based perceptual metric which incorporates the frequency response of the Human Visual System and examines its effectiveness in providing insights for common operations of an image synthesis algorithm (e.g., blurring).
Abstract: It is often the case that images generated by image synthesis algorithms are judged by visual examination. The user resorts to an iterative refinement process of inspection and rendering until a satisfactory image is obtained. In this paper we propose quantitative metrics to compare images that arise from an image synthesis algorithm. The intent is to be able to guide the refinement process inherent in image synthesis. The Mean-Square-Error (MSE) has been traditionally employed to guide this process. However, it is not a viable metric for image synthesis control. We propose the use of a wavelet based perceptual metric which incorporates the frequency response of the Human Visual System. A useful aspect of the wavelet based metric is its ability to selectively measure the changes to structures of different sizes and scales in specific locations. Also, by resorting to the use of wavelets of various degrees of regularity, one can seek different levels of smoothness in an image. It is rare that such level of control can be obtained from a metric other than a wavelet based metric. We show the usefulness of our metric by examining its effectiveness in providing insights for common operations of an image synthesis algorithm (e.g., blurring). We also provide some examples of its use in rendering algorithms frequently used in graphics.

Proceedings ArticleDOI
03 Jun 1997
TL;DR: This paper outlines a new model of the human visual system (HVS) and shows how this model can be used in image quality assessment and departs from previous approaches in three ways.
Abstract: Reliable image quality assessments are necessary for evaluating digital imaging methods (halftoning techniques) and products (printers, displays). Typically the quality of the imaging method or product is evaluated by comparing the fidelity of an image before and after processing by the imaging method or product. It is well established that simple approaches like mean squared error do not provide meaningful measures of image fidelity. A number of image fidelity metrics have been developed whose goal was to predict the amount of differences that would be visible to a human observer. In this paper we outline a new model of the human visual system (HVS) and show how this model can be used in image quality assessment. Our model departs from previous approaches in three ways: (1) We use a physiologically and psychophysically plausible Gabor pyramid to model a receptive field decomposition; (2) We use psychophysical experiments that directly assess the percept we wish to model; and (3) We model discrimination performance by using discrimination thresholds instead of detection thresholds. The first psychophysical experiment tested the visual system's sensitivity as a function of spatial frequency, orientation, and average luminance. The second experiment tested the relation between contrast detection and contrast discrimination.

01 Jan 1997
TL;DR: This tutorial paper describes the log-polar mapping of the eye’s retinal image and its main properties and describes the motivation behind the excitation of the cortex.
Abstract: One interesting feature of the human visual system is the topological transformation of the retinal image into its cortical projection. The excitation of the cortex can be approximated by a log-polar mapping of the eye’s retinal image. In this tutorial paper we describe the log-polar mapping and its main properties.

Proceedings ArticleDOI
02 Jul 1997
TL;DR: A new watermarking system for digital images is presented: the method embeds a sequence of random real numbers in a selected set of DCT coefficients to ensure the watermark invisibility.
Abstract: Digital watermarking has been proposed as a means to protect the copyright of multimedia data in a networked environment, since it makes it possible to tightly embed a code into a digital document allowing the identification of the data owner. A new watermarking system for digital images is presented: the method embeds a sequence of random real numbers in a selected set of DCT coefficients. Embedding is performed by exploiting the masking characteristics of the human visual system, to ensure the watermark invisibility. The embedded sequence can be extracted without resorting to the original uncorrupted image. Experimental results demonstrate that the watermark is robust to several signal processing techniques and geometric distortions.

Journal ArticleDOI
TL;DR: It is suggested that in two–dimensional translation invariance–as in three–dimensional rotation invariance—the human visual system is relying on memory–intensive rather than computation–intensive processes.
Abstract: Invariance of object recognition to translation in the visual field is a fundamental property of human pattern vision. In three experiments we investigated this capability by training subjects to distinguish between random checkerboard stimuli. We show that the improvement of discrimination performance does not transfer across the visual field if learning is restricted to a particular location in the retinal image. Accuracy after retinal translation shows no sign of decay over time and remains at the same level it had at the beginning of the training. It is suggested that in two-dimensional translation invariance-as in three-dimensional rotation invariance-the human visual system is relying on memory-intensive rather than computation-intensive processes. Multiple position- and stimulus-specific learning events may be required before recognition is independent of retinal location.

Proceedings ArticleDOI
07 Sep 1997
TL;DR: Energy normalised cross correlation is used to maintain heading, to estimate confidence and to servo control a robot vehicle while following a path.
Abstract: Describes the use of appearance based vision for defining visual processes for navigation. A visual processes which transform images to commands and events. A family of visual processes are defined by associating the appearance of a scene from a given viewpoint with the simple trajectories. Appearance is captured as a set of low-resolution images. Energy normalised cross correlation is used to maintain heading, to estimate confidence and to servo control a robot vehicle while following a path. Experimental results are presented which compare results with a single camera, a pair of parallel cameras and a pair of divergent cameras. The most accurate (and robust) navigation is found with a pair of cameras which are slightly divergent.

Journal ArticleDOI
01 Jun 1997
TL;DR: By considering the characteristics of block motions for typical image sequences, an intelligent classifier is proposed to separate blocks containing moving edges to improve on conventional intensity-based block matching approaches, and a fast and efficient block motion estimation algorithm is developed.
Abstract: Intensity-based block motion estimation and compensation algorithms are widely used to exploit temporal redundancies in video coding, although they suffer from several drawbacks. One of the problems is that blocks located on boundaries of moving objects are not estimated accurately. It causes poor motion-compensated prediction along the moving edges to which the human visual system is very sensitive. By considering the characteristics of block motions for typical image sequences, an intelligent classifier is proposed to separate blocks containing moving edges to improve on conventional intensity-based block matching approaches. The motion vectors of these blocks are computed using edge matching techniques, so that the motion-compensated frames are tied more closely to the physical features. The proposed method can then make use of this accurate motion information for edge blocks to compute the remaining non-edged blocks. Consequently, a fast and efficient block motion estimation algorithm is developed. Experimental results show that this approach gives a significant improvement in accuracy for motion-compensated frames and computational complexity, in comparison with the traditional intensity-based block motion estimation methods.

01 Mar 1997
TL;DR: A model that uses iconic scene representations derived from oriented spatiochromatic filters at multiple scales for gaze targeting that indicates excellent agreement between eye movements predicted by the model and those recorded from human subjects is proposed.
Abstract: Visual cognition depends critically on the moment-to-moment orientation of gaze. Gaze is changed by saccades, rapid eye movements that orient the fovea over targets of interest in a visual scene. Saccades are ballistic; a prespecified target location is computed prior to the movement and visual feedback is precluded. Once a target is fixated, gaze is typically held for about 300 milliseconds, although it can be held for both longer and shorter intervals. Despite these distinctive properties, there has been no specific computational model of the gaze targeting strategy employed by the human visual system during visual cognitive tasks. This paper proposes such a model that uses iconic scene representations derived from oriented spatiochromatic filters at multiple scales. Visual search for a target object proceeds in a coarse-to-fine fashion with the target''s largest scale filter responses being compared first. Task-relevant target locations are represented as saliency maps which are used to program eye movements. Once fixated, targets are remembered by using spatial memory in the form of object-centered maps. The model was empirically tested by comparing its performance with actual eye movement data from human subjects in natural visual search tasks. Experimental results indicate excellent agreement between eye movements predicted by the model and those recorded from human subjects.

Journal ArticleDOI
TL;DR: A frequency and contrast dependent metric in the DCT domain is developed using a fully non-linear and suprathreshold contrast perception model: the Information Allocation Function (IAF) of the visual system.