scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 1988"


Journal ArticleDOI
14 Jan 1988-Nature
TL;DR: In this paper, the human visual system can rapidly and accurately derive the three-dimensional orientation of surfaces by using variations in image intensity alone, which is one of the most important yet poorly understood aspects of human vision.
Abstract: The human visual system can rapidly and accurately derive the three-dimensional orientation of surfaces by using variations in image intensity alone. This ability to perceive shape from shading is one of the most important yet poorly understood aspects of human vision. Here we present several findings which may help reveal computational mechanisms underlying this ability. First, we find that perception of shape from shading is a global operation which assumes that there is only one light source illuminating the entire visual image. This implies that if two identical objects are viewed simultaneously and illuminated from different angles, then we would be able to perceive three-dimensional shape accurately in only one of them at a time. Second, three-dimensional shapes that are defined exclusively by shading can provide tokens for the perception of apparent motion, suggesting that the motion mechanism is remarkably versatile in the kinds of inputs it can use. Lastly, the occluding edges which delineate an object from its background can also powerfully influence the perception of three-dimensional shape from shading.

695 citations


Journal ArticleDOI
TL;DR: A model is presented that combines the outputs of a set of spatiotemporal motion-energy filters to estimate image velocity and a measure of image-flow uncertainty is formulated; preliminary results indicate that this uncertainty measure may be used to recognize ambiguity due to the aperture problem.
Abstract: A model is presented, consonant with current views regarding the neurophysiology and psychophysics of motion perception, that combines the outputs of a set of spatiotemporal motion-energy filters to estimate image velocity. A parallel implementation computes a distributed representation of image velocity. A measure of image-flow uncertainty is formulated; preliminary results indicate that this uncertainty measure may be used to recognize ambiguity due to the aperture problem. The model appears to deal with the aperture problem as well as the human visual system since it extracts the correct velocity for some patterns that have large differences in contrast at different spatial orientations.

573 citations


Journal ArticleDOI
TL;DR: In this paper, a solution to the correspondence problem for stereopsis is proposed using the differences in the complex phase of local spatial frequency components, which can discriminate disparities significantly smaller than the width of a pixel.
Abstract: A solution to the correspondence problem for stereopsis is proposed using the differences in the complex phase of local spatial frequency components. One-dimensional spatial Gabor filters (Gabor 1946; Marcelja 1980), at different positions and spatial frequencies are convolved with each member of a stereo pair. The difference between the complex phase at corresponding points in the two images is used to find the stereo disparity. Disparity values are combined across spatial frequencies for each image location. Three-dimensional depth maps have been computed from real images under standard lighting conditions, as well as from random-dot stereograms (Julesz 1971). The algorithm can discriminate disparities significantly smaller than the width of a pixel. It is possible that a similar mechanism might be used in the human visual system.

295 citations


Journal ArticleDOI
TL;DR: This paper shows how the seemingly intractable problem of visual perception can be converted into a much simpler problem by the application of several physical and biological constraints and argues strongly for the validity of the computational approach to modeling the human visual system.
Abstract: This paper demonstrates how serious consideration of the deep complexity issues inherent in the design of a visual system can constrain the development of a theory of vision. We first show how the seemingly intractable problem of visual perception can be converted into a much simpler problem by the application of several physical and biological constraints. For this transformation, two guiding principles are used that are claimed to be critical in the development of any theory of perception. The first is that analysis at the ‘complexity level’ is necessary to ensure that the basic space and performance constraints observed in human vision are satisfied by a proposed system architecture. Second, the ‘maximum power/minimum cost principle’ ranks the many architectures that satisfy the complexity level and allows the choice of the best one. The best architecture chosen using this principle is completely compatible with the known architecture of the human visual system, and in addition, leads to several predictions. The analysis provides an argument for the computational necessity of attentive visual processes by exposing the computational limits of bottom-up early vision schemes. Further, this argues strongly for the validity of the computational approach to modeling the human visual system. Finally, a new explanation for the pop-out phenomenon so readily observed in visual search experiments, is proposed.

143 citations


Journal ArticleDOI
TL;DR: A color space defined by the fundamental spectral sensitivity functions of the human visual system is used and specific guidelines are offered for the design of computer graphics displays that will accommodate almost all color-deficient users.
Abstract: A color space defined by the fundamental spectral sensitivity functions of the human visual system is used to assist in the design of computer graphics displays for color-deficient users. The functions are derived in terms of the CIE standard observer color-matching functions. The Farnsworth-Munsell 100-hue test, a widely used color vision test administered using physical color samples, is then implemented on a digitally controlled color television monitor. The flexibility of this computer graphics medium is then used to extend the Farnsworth-Munsell test in a way that improves the specificity of the diagnoses rendered by the test. The issue of how the world appears to color-deficient observers is addressed, and a full-color image is modified to represent a color-defective view of the scene. Specific guidelines are offered for the design of computer graphics displays that will accommodate almost all color-deficient users. >

121 citations


Patent
19 Oct 1988
TL;DR: In this article, a binary bit image pattern having a minimum visual noise for each density level in an image is produced by employing a stochastic combinatorial minimization technique and a human visual system modulation transfer function (MTF) weighting function.
Abstract: A digital halftone image is produced by providing a binary bit image pattern having a minimum visual noise for each density level in an image. The patterns are produced by employing a stochastic combinatorial minimization technique and a human visual system modulation transfer function (MTF) weighting function to generate a halftone pattern for each density level of the multi-level digital image signal. A halftone image is produced by modularly addressing these patterns with each pixel value in the image.

94 citations


Patent
20 May 1988
TL;DR: In this article, a system for compressing and transmitting a digital image signal over a limited bandwidth communication channel, transform codes the image values and quantizes the transform coefficients according to a two-dimensional model of the human visual system.
Abstract: A system for compressing and transmitting a digital image signal over a limited bandwidth communication channel, transform codes the image values and quantizes the transform coefficients according to a two-dimensional model of the sensitivity of the human visual system. The model of the human visual system is characterized by being less sensitive to diagonally oriented spatial frequencies than to horizontally or vertically oriented spatial frequencies, thereby achieving increased compression of the image.

78 citations


Journal ArticleDOI
01 Mar 1988
TL;DR: The authors explain the use of units responsive to Gabor signals in vision, considered as a process in inference from the retinal signals to a symbolic description, which is derived directly from the fundamental constraints on visual inference.
Abstract: Recent physiological research has indicated that the visual system makes use of units responsive to Gabor signals in the analysis of visual stimuli. Such functions effect a tradeoff between pure spatial- and frequency-domain descriptions. The authors explain the use of such representations in vision, considered as a process in inference from the retinal signals to a symbolic description. The appropriate mathematical structure for the inference is that of the subspaces of the signal vector space, a feature which it shares with quantum mechanics. The theory is derived directly from the fundamental constraints on visual inference. It is then shown to be consistent with many of the known properties of the visual system. In particular, a major feature of the inference system-the occurrence of interference effects-has already been observed in visual system operation. >

45 citations


Journal Article
TL;DR: This work finds that perception of shape from shading is a global operation which assumes that there is only one light source illuminating the entire visual image, and that if two identical objects are viewed simultaneously and illuminated from different angles, then the authors would be able to perceive three-dimensional shape accurately in only one of them at a time.
Abstract: The human visual system can rapidly and accurately derive the three-dimensional orientation of surfaces by using variations in image intensity alone. This ability to perceive shape from shading is one of the most important yet poorly understood aspects of human vision. Here we present several findings which may help reveal computational mechanisms underlying this ability. First, we find that perception of shape from shading is a global operation which assumes that there is only one light source illuminating the entire visual image. This implies that if two identical objects are viewed simultaneously and illuminated from different angles, then we would be able to perceive three-dimensional shape accurately in only one of them at a time. Second, three-dimensional shapes that are defined exclusively by shading can provide tokens for the perception of apparent motion, suggesting that the motion mechanism is remarkably versatile in the kinds of inputs it can use. Lastly, the occluding edges which delineate an object from its background can also powerfully influence the perception of three-dimensional shape from shading.

40 citations


Proceedings ArticleDOI
24 Jun 1988
TL;DR: In this article, it was shown that the appearance of the image, independently of the actual spatial energy distribution on the retina, affects the ability of the human visual system to process information.
Abstract: The NTSC standard for color television codes the chrominance signals at a lower spatial resolution than it codes the luminance signal. These differential resolutions result in a smearing of the colors in the scene relative to the edges that define objects, but television viewers are rarely aware of this degradation of the image because the human visual system also codes chrominance (i.e. hue) at a lower spatial resolution than it codes luminance (i.e. edges). Given the resolution difference for chrominance and luminance edges, a model of visual perception must explain why human observers do not perceive the color of objects flowing beyond the luminance edges of those objects. The internal chromatic aspects of objects, which may be determined at chrominance object-boundaries, may be constrained by the perceived spatial luminance boundaries of those objects. In the experiments that we will describe, a spatial chrominance and luminance boundary is prevented from moving on the viewer's retina by moving it image synchronously with eye movements. When the edge is stabilized on the retina, the appearance of the image depends upon the enclosing boundaries that are not stabilized on the retina. We have found that the appearance of the image, independently of the image's actual spatial energy distribution on the retina affects the ability of the human visual system to process information. For example, the viewer's flicker sensitivity depends upon the perceived color of the image and not its actual spectral energy distribution. The same is true for the perceived color of a small spot imaged on the stabilized fields.© (1988) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

38 citations


Book ChapterDOI
01 Dec 1988
TL;DR: This chapter surveys results of a variety of research work conducted in this area using different depth cues, including depth from stereopsis; structure from motion parallax and optical flow, shape from shading, texture, and surface contours; and shape from occluding contours.
Abstract: In recent years, an important area of research in computer vision has been the recovery or inference of the 3-D information about a scene from its 2-D image. In light of the fascinating power of humans in inferring the 3-D structure of objects from visual images, a great deal of effort has been directed towards the understanding of each module in the human visual system. The results of these efforts have yielded mathematical models for most of these modules. In this chapter, we survey results of a variety of research work conducted in this area using different depth cues, including depth from stereopsis; structure from motion parallax and optical flow, shape from shading, texture, and surface contours; and shape from occluding contours. Also included are two nonanthropomorphic approaches, i.e., structure from volume intersection and shape from spatial encoding, which are closely tied to cues used in the human visual system for inferring 3-D structure from 2-D images. We present the strengths and shortcomings of each approach, and briefly discuss the possibility of combining different approaches in order to obtain more robust and reliable results, and suggest the direction of future research.

Proceedings ArticleDOI
11 Apr 1988
TL;DR: Experimental results indicate that there is a significant improvement in the perceived quality of the images encoded in the visual model, although the measured distortion between the images is slightly better for the intensity encoded images.
Abstract: A method of vector quantization which uses a color model of the human visual system to improve the quality of encoded images is presented. In this method, the images are first transformed into a 'perceptual space' and then encoded. After reconstruction, the image is transformed back into an intensity representation for viewing. A comparison is made between images encoded at 1.125 bits per pixel with and without the visual model using both an SNR measure and visual observation. Experimental results indicate that there is a significant improvement in the perceived quality of the images encoded in the visual model, although the measured distortion between the images is slightly better for the intensity encoded images. >

Journal ArticleDOI
TL;DR: The human visual system has a set of stereoscopic foreoptics that are instantly and automatically focusable to a few centimeters and are fully corrected for geometric aberrations, a servo‐controlled two‐axis scanning mechanism, a millisecond framing rate, sensitivity to brightnesses varying by a factor of 100 billion, the ability to detect a single photon, nearly 100% quantum efficiency and a spatial as well as temporal image processor that could not be matched by the fastest supercomputer.
Abstract: Vision is awe inspiring. The wondrous nature of this sensory process becomes clear when we consider just a few of its features. In the terminology of today's technology, the human visual system has a set of stereoscopic foreoptics that are instantly and automatically focusable to a few centimeters and are fully corrected for geometric aberrations, a servo‐controlled two‐axis scanning mechanism, a millisecond framing rate, sensitivity to brightnesses varying by a factor of 100 billion, the ability to detect a single photon, nearly 100% quantum efficiency and a spatial as well as temporal image processor that could not be matched by the fastest supercomputer.

Proceedings ArticleDOI
11 Apr 1988
TL;DR: A new image coding algorithm is presented which includes a technique for segmenting image sequences and a modified adaptive 3-D arithmetic coder (MAAC), with these enhancements, it will be possible to transmit an image sequence at 30 frames per second over a digital telephone line.
Abstract: With the availability of a 144 Kbits/sec user's digital access to the integrated service digital network (ISDN), transmission of video data over the telephone network will become a reality. However, the required compression is extremely high and improving the visual quality of the decoded image data continues to be a challenge. The authors present a new image coding algorithm which includes a technique for segmenting image sequences and a modified adaptive 3-D arithmetic coder (MAAC). With these enhancements, it will be possible to transmit an image sequence at 30 frames per second over a digital telephone line. >

Journal ArticleDOI
TL;DR: In designing imagery systems, simply increasing the spatial and temporal addressability and resolution beyond limits set by the human visual system will have a negligible impact on image quality, but that effective use of antialiasing techniques could allow visual information about object features to be presented with great fidelity.
Abstract: The assumption that antialiasing destroys useful visual information about object features is challenged in three experiments that examine the effects of antialiasing on the visual information for object location and motion. The results show that proper antialiasing eliminates the spurious visual information produced by sampling processes in image synthesis and allows the viewer's visual system to produce a precise representation of object location and a continuous representation of object motion. This suggests that in designing imagery systems, simply increasing the spatial and temporal addressability and resolution beyond limits set by the human visual system will have a negligible impact on image quality, but that effective use of antialiasing techniques could allow visual information about object features to be presented with great fidelity. >

Journal ArticleDOI
TL;DR: This paper proposes a relaxation algorithm for feature point matching where the formation of smooth trajectories over space and time is favored and is presented to demonstrate the merit of out approach.

Proceedings ArticleDOI
25 May 1988
TL;DR: A real-time image processing system called SIPS (Sight Information Processing System) has been designed and constructed and has been used for research on video image processing such as low-level vision processing and image coding.
Abstract: A real-time image processing system called SIPS (Sight Information Processing System) has been designed and constructed. SIPS realizes a high-speed performance of 2.7 Giga operations/s through parallel and distributed processing of its 82 independently controlled processors. SIPS provides both real-time processing of color video images and a high degree of flexibility. SIPS became operational in 1985 and has been used for research on video image processing such as low-level vision processing and image coding. The architectures and possible applications of SIPS are presented. >

Journal ArticleDOI
TL;DR: It is proposed that this selectivity is an essential feature for any system to analyse raw image sequences of moving, dividing cells as the computational expense of allowing all possible processing to proceed is enormous.

Book ChapterDOI
01 Jan 1988
TL;DR: One point of convergence is found in current “bootstrapping” procedures that analyze visual information into minimal stimulus conditions and then seek to model processes by which these conditions can be transformed into relevant environmental properties.
Abstract: An important trend in the visual sciences is the emerging convergence between psychophysical and computational approaches to visual information processing (Beck, Hope, and Rosenfeld, 1983). Each field is concerned with similar issues and problems; however, each applies a sufficiently different approach to provide complementary lines of investigation. One point of convergence is found in current “bootstrapping” procedures that analyze visual information into minimal stimulus conditions and then seek to model processes by which these conditions can be transformed into relevant environmental properties.

Proceedings ArticleDOI
24 Jul 1988
TL;DR: DETE, a computer implementation of a neurally inspired, computational model of the associative interactions between two types of cognitive functions in humans: the processing of visual and verbal information, is designed to learn to associate language descriptions of objects moving in a visual field with those objects, and thus explore the grounding problem.
Abstract: A description is given of a neurally inspired, computational model of the associative interactions between two types of cognitive functions in humans: the processing of visual and verbal information. DETE, a computer implementation of the model, is designed to learn to associate language descriptions of objects moving in a visual field with those objects, and thus explore the grounding problem, i.e. how language semantics maps to sensory experiences. The model consists of connectionist and microsymbolic modules responsible for processing of the visual and verbal inputs and interacting through an associative module coupled with a mechanism for selective attention. DETE accepts input from two modalities: visual, which consists of a continuous sequence of visual scenes showing the behavior of simple 2-D shaped objects in a square region, and verbal, i.e. occasional streams of English sentences describing the everchanging visual scenes. The overt behavior of the system is language generation and visual imagination. >

Proceedings ArticleDOI
25 Oct 1988
TL;DR: Results showed that the subjective quality of the processed images is significantly improved even at a low bit rate, and an adaptive cosine transform coding scheme capable of real-time operation is described.
Abstract: An adaptive cosine transform coding scheme capable of real-time operation is described. It employs an adaptive quantization scheme where the quantizer range is dynamically scaled by a fedback parameter from the rate buffer. To take into account the human visual characteristics, human visual system (HVS) properties are incorporated into the coding scheme. Results showed that the subjective quality of the processed images is significantly improved even at a low bit rate of 0.15 bit per pixel (bpp). Two images were coded with the adaptive scheme achieving an average of 0.2 bpp with very little perceivable degragation.

Journal ArticleDOI
TL;DR: A two-stage coding system for reducing the high data rate of a digital HDTV signal (≈1 Gbit/s) is described; spatio-temporal subsampling of the interlaced source signal is used in conjunction with an intrafield DPCM of the remaining samples.

Proceedings ArticleDOI
22 Mar 1988
TL;DR: A method of real-time visual attention processing for robots performing visual guidance based on a novel vision processor, the multi-window vision system, which was developed at the University of Tokyo shows the potential of local visual processing in its use for robotic attention processing.
Abstract: This paper describes a method of real-time visual attention processing for robots performing visual guidance. This robot attention processing is based on a novel vision processor, the multi-window vision system that was developed at the University of Tokyo. The multi-window vision system is unique in that it only processes visual information inside local area windows. These local area windows are quite flexible in their ability to move anywhere on the visual screen, change their size and shape, and alter their pixel sampling rate. By using these windows for specific attention tasks, it is possible to perform high speed attention processing. The primary attention skills of detecting motion, tracking an object, and interpreting an image are all performed at high speed on the multi-window vision system. A basic robotic attention scheme using the attention skills was developed. The attention skills involved detection and tracking of salient visual features. The tracking and motion information thus obtained was utilized in producing the response to the visual stimulus. The response of the attention scheme was quick enough to be applicable to the real-time vision processing tasks of playing a video 'pong' game, and later using an automobile driving simulator. By detecting the motion of a 'ball' on a video screen and then tracking the movement, the attention scheme was able to control a 'paddle' in order to keep the ball in play. The response was faster than that of a human's, allowing the attention scheme to play the video game at higher speeds. Further, in the application to the driving simulator, the attention scheme was able to control both direction and velocity of a simulated vehicle following a lead car. These two applications show the potential of local visual processing in its use for robotic attention processing.

Proceedings ArticleDOI
05 Jun 1988
TL;DR: A novel approach to machine-based visual perception of monocular images is described and initial recognition results are presented.
Abstract: A novel approach to machine-based visual perception of monocular images is described and initial recognition results are presented. The approach uses a recursive procedure to generate a series of reconstructed versions of the raw video image. The procedure is motivated by certain perceptual organization functions of the human visual system. Recognition of object categories is attempted at each step by comparing the newly generated regions to stored object categories. Category retrieval is carried out using a software-based content-addressing scheme which provides access to complete object representations in memory using incomplete portions of the representation. The category-representation scheme is sufficiently general to allow a variety of poorly correlated images of specific category examples to be represented and recognized by a single general category representation. These properties are illustrated using a group of distorted, defective, and idealized images of ASCII A's and handguns. >

Journal ArticleDOI
Neville Drasdo1
TL;DR: The Snellen test has been the most popular clinical measurement of spatial vision for over a century, but it does not fully express the visual ability of an individual.

Journal ArticleDOI
TL;DR: An algorithm based on the characteristics of the human visual system is presented by which it is possible to select automatically the thresholds for detecting the significant edges as perceived by human beings to provide a satisfactory improvement in the performance over the conventional edge detection process for a wide range of input image.
Abstract: An algorithm based on the characteristics of the human visual system is presented by which it is possible to select automatically (without human intervention) the thresholds for detecting the significant edges as perceived by human beings. The threshold value changes with the background intensity according to the criterion governed by the characteristic of one of the De Vries–Rose, Weber, and saturated regions. The effect of background size and splitting image (dynamic thresholding), and a provision for reducing the computation time are also included in the study. The algorithm is found to provide a satisfactory improvement in the performance over the conventional edge detection process for a wide range of input image.

Proceedings ArticleDOI
11 Apr 1988
TL;DR: A class of image quantizers called model-testing vector quantizers is introduced, which are finite-state quantizers with the additional ability to make a decision about whether the current input conforms to the assumed finite- state model or not.
Abstract: A class of image quantizers called model-testing vector quantizers is introduced. These are finite-state quantizers with the additional ability to make a decision about whether the current input conforms to the assumed finite-state model or not. The nonconforming component is coded at relatively lower bit rates by utilizing well-known linear and nonlinear properties of the human visual system. An adaptive spatial and intensity quantization procedure is used that removes the perceptually irrelevant information from the nonconforming component. The differential brightness discrimination of the eye is also utilized. Resulting coded images are indistinguishable from the originals at bit rates that range from 1.0 to 3.5 bits/pixel for a wide variety of 512*512 images. These bit rates can be regarded as upper bounds for the perceptual entropy of the corresponding images. >

Proceedings ArticleDOI
16 Dec 1988
TL;DR: A preliminary image quality measure which attempts to take into account the sensitivities of the human visual system (HVS) is described and allows experimentation with numerous parameters of the HVS model to determine the optimum set for which the high-est correlation with subjective evaluations can be achieved.
Abstract: A preliminary image quality measure which attempts to take into account the sensitivities of the human visual system (HVS) is described. The main sensitivities considered are the background illumination-level and spatial frequency sensitivities. Given a digitized image the algorithm produces, among several other figures of merit, a plot of the information content (IC) versus the resolution. The IC for a given resolution is defined here as the sum of the weighted spectral components at that resolution. The HVS normalization is done via first intensity-remapping the image by a monotone increasing funciton representing the background illumination-level sensitivity, followed by a spectral filtering via an HVS-derived weighting function representing the spatial frequency sensitivity. The developed quality measure is conveniently parametereized and interactive. It allows experimentation with numerous parameters of the HVS model to determine the optimum set for which the high-est correlation with subjective evaluations can be achieved.

Book ChapterDOI
01 May 1988
TL;DR: This paper reports on recent progress in Computer Vision by the Oxford Robotics Research Group and discusses in particular: edge and corner finding; shape from contour; parallel algorithms for computing shape representations; parallel architectures for computer vision; and the application of truth maintenance systems to recognise variable geometry objects in cluttered images.
Abstract: This paper reports on recent progress in Computer Vision by the Oxford Robotics Research Group. We discuss in particular: edge and corner finding; shape from contour; parallel algorithms for computing shape representations; parallel architectures for computer vision; and the application of truth maintenance systems to recognise variable geometry objects in cluttered images. Model-based vision and data-directed vision are discussed as extreme cases of architectures for vision systems.

Proceedings ArticleDOI
24 Jun 1988
TL;DR: In this paper, the human visual psychophysics is used to describe human visual system performance for display design and the measurements needed for human performance to play a greater role in display design for a wider range of tasks.
Abstract: It is common knowledge that the human engineering of displays is equally important to electro-optical or software engineering. To make human engineering a routine part of display design we must have quantitative rules of thumb describing the performance of the human visual system. For colour visual psychophysics is the scientific discipline that creates such measurements. Some aspects of colour psychophysics are well understood, mostly concerned with very low level processes in the visual system. This paper describes where they can be useful applied in display design. It also discusses measurements needed for human performance to play a greater role in display design for a wider range of tasks.