scispace - formally typeset
Search or ask a question

Showing papers on "Human visual system model published in 1989"


Journal ArticleDOI
TL;DR: A preliminary image quality measure that takes into account two major sensitivities of the human visual system (HVS) is described and allows experimentation with numerous parameters of the HVS model to determine the optimum set for which the highest correlation with subjective evaluations can be achieved.
Abstract: A preliminary image quality measure that takes into account two major sensitivities of the human visual system (HVS) is described. The sensitivities considered are background illumination level and spatial frequency sensitivities. Given a digitized monochrome image, the algorithm produces, among some other figures of merit, a plot of the information content (IC) versus the resolution in units of pixels. The IC is defined here as the sum of the weighted spectral components at an arbitrary specified resolution. The HVS normalization is done by first intensity remapping the image by a monotonically increasing function representing the background illumination level sensitivity, followed by a spectral filtering to compensate for the spatial frequency sensitivity. The developed quality measure is conveniently parameterized and interactive. It allows experimentation with numerous parameters of the HVS model to determine the optimum set for which the highest correlation with subjective evaluations can be achieved. The preliminary results are promising.

116 citations


01 Jan 1989
TL;DR: The motivations of the models developed in psychophysiology, computer vision, and image processing are described and how they relate to the wavelet transform is described.
Abstract: In this paper we review recent multichannel models de- veloped in psychophysiology, computer vision, and image processing. In psychophysiology, multichannel models have been particularly suc- cessful in explaining some low-level processing in the visual cortex. The expansion of a function into several frequency channels provides a rep- resentation which is intermediate between a spatial and a Fourier rep- resentation. We describe the mathematical properties of such decom- positions and introduce the wavelet transform. We review the classical multiresolution pyramidal transforms developed in computer vision and show how they relate to the decomposition of an image into a wavelet orthonormal basis. In the last section we discuss the properties of the zero crossings of multifrequency channels. Zero-crossings represen- tations are particularly well adapted for pattern recognition in com- puter vision. I. INTRODUCTION ITHIN the last 10 years, multifrequency channel W decompositions have found many applications in image processing. In the psychophysiology of human vi- sion, multichannel models have also been particularly successful in explaining some low-level biological pro- cesses. The expansion of a function into several fre- quency channels provides a representation which is inter- mediate between a spatial and a Fourier representation. In harmonic analysis, this kind of transform appeared in the work of Littlewood and Payley in the 1930's. More re- search has recently been focused on this domain with the modeling of a new decomposition called the wavelet transform. In this paper we review the recent multichan- nel models developed in psychophysiology, computer vi- sion, and image processing. We describe the motivations of the models within each of these disciplines and show how they relate to the wavelet transform. In psychophysics and the physiology of human vision, evidence has been gathered showing that the retinal image is decomposed into several spatially oriented frequency channels. In the first section of this paper, we describe the experimental motivations for this model. Biological studies of human vision have always been a source of ideas for computer vision and image processing research. Indeed, the human visual system is generally considered to be an optimal image processor. The goal is not to im- itate the processings implemented in the human brain, but rather to understand the motivations of such processings

114 citations


Patent
19 Jul 1989
TL;DR: In this article, an adaptive transform coding algorithm for a still image is proposed, where the image is divided into small blocks of pixels and each block of pixels is transformed using an orthogonal transform such as a discrete cosine transform.
Abstract: In accordance with our adaptive transform coding algorithm for a still image, the image is divided into small blocks of pixels and each block of pixels is transformed using an orthogonal transform such as a discrete cosine transform. The resulting transform coefficients are compressed and coded to form a bit stream for transmission to a remote receiver. The compression parameters for each block of pixels are chosen based on a busyness measure for the block such as the magnitude of the (K+1) th most significant transform coefficient. This enables busy blocks for which the human visual system is not sensitive to degradation to be transmitted at low bit rates while enabling other blocks for which the human visual system is sensitive to degradation to be transmitted at higher bit rates. Thus, the algorithm is able to achieve a tradeoff between image quality and bit rate.

94 citations


Proceedings ArticleDOI
04 Oct 1989
TL;DR: Various methods of handling video data on the MediaBENCH are introduced and discussed to show how video data can be manipulated on visual database systems which deal with spatial and temporal factors.
Abstract: The importance of content-oriented visual user interfaces using video icons for visual database systems is clarified. The effectiveness of both still and live video images, especially for user's browsing and interaction, is shown by means of the MediaBENCH (hypermedia basic environment for computer and human interactions), which is a basic prototype multimedia database system. Various methods of handling video data on the MediaBENCH are introduced and discussed to show how video data can be manipulated on visual database systems which deal with spatial and temporal factors. A visual interface using video icons is quite suitable to video editing, presentation support or other electronic video document systems. >

86 citations


Journal ArticleDOI
TL;DR: This model permits the introduction of a well‐justified contrast definition and is closely linked with logarithmic images and from a mathematical point of view, it is set up in an algebraic structure.
Abstract: SUMMARY Logarithmic images, such as images obtained by transmitted light or those produced by the human visual system, differ from linear images. Their processing and analysis require consequently specific laws and structures. The latter have been developed in the concept of a logarithmic image processing (LIP) model (Jourlin & Pinoli, 1987, 1988; Pinoli, 1987a). This model permits the introduction of a well-justified contrast definition: from a physical point of view, it is closely linked with logarithmic images and from a mathematical point of view, it is set up in an algebraic structure. The applications presented at the end of this paper concern image preprocessing and segmentation. In particular, in the case of microscopic images, the proposed method of segmentation gives good results with transmitted light (thin foils in biology or transmitted electronic microscopy). However, images obtained by reflected light microscopy are not within the scope of this model.

73 citations


Proceedings ArticleDOI
01 Nov 1989
TL;DR: This paper reviews these multiresolution techniques and discusses how they may be usefully combined in the future to match the requirements of human perception.
Abstract: Important new techniques for representing and analyzing image data at multiple resolutions have been developed over the past several years. Closely related multiresolution structures and procedures have been developed more or less independently in diverse scientific fields. For example, pyramid and subband representations have been applied to image compression, and promise excellent performance and flexibility. Similar pyramid structures have been developed as models for the neural coding of images within the human visual system. The pyramid has been developed in the computer vision field as a general framework for implementing highly efficient algorithms, including algorithms for motion analysis and object recognition. In this paper I review these multiresolution techniques and discuss how they may be usefully combined in the future. Methods used in image compression, for example, should match the requirements of human perception, and future 'smart' transmission systems will need to perform rapid analysis in order to selectively encode the most critical information in a scene.

57 citations


Journal ArticleDOI
TL;DR: This study investigates the ability of observers to see the 3-D shape of an object using motion cues, so called structure-from-motion (SFM), and shows that the human visual system integrates motion information spatially and temporally as part of the process for computing SFM.
Abstract: Although it is appreciated that humans can use a number of visual cues to perceive the three-dimensional (3-D) shape of an object, for example, luminance, orientation, binocular disparity, and motion, the exact mechanisms employed are not known (De Yoe and Van Essen 1988). An important approach to understanding the computations performed by the visual system is to develop algorithms (Marr 1982) or neural network models (Lehky and Sejnowski 1988; Siegel 1987) that are capable of computing shape from specific cues in the visual image. In this study we investigated the ability of observers to see the 3-D shape of an object using motion cues, so called structure-from-motion (SFM). We measured human performance in a two-alternative forced choice task using novel dynamic random-dot stimuli with limited point lifetimes. We show that the human visual system integrates motion information spatially and temporally (across several point lifetimes) as part of the process for computing SFM. We conclude that SFM algorithms must include surface interpolation to account for human performance. Our experiments also provide evidence that local velocity information, and not position information derived from discrete views of the image (as proposed by some algorithms), is used to solve the SFM problem by the human visual system.

44 citations


Book ChapterDOI
01 Sep 1989
TL;DR: The computational approach to the study of vision inquires directly into the sort of information processing needed to extract important information from the changing visual image, such as the three-dimensional structure and movement of objects in the scene, or the color and texture of object surfaces.
Abstract: The computational approach to the study of vision inquires directly into the sort of information processing needed to extract important information from the changing visual image---information such as the three-dimensional structure and movement of objects in the scene, or the color and texture of object surfaces. An important contribution that computational studies have made is to show how difficult vision is to perform, and how complex are the processes needed to perform visual tasks successfully. This article reviews some computational studies of vision, focusing on edge detection, binocular stereo, motion analysis, intermediate vision, and object recognition.

29 citations


Proceedings ArticleDOI
20 Mar 1989
TL;DR: A computational model of visual motion discrimination in the human visual system is being developed, with application to interpretation of moving light displays, and the nature of memory structures devoted to representation and recognition ofVisual motion sequences is explored.
Abstract: A computational model of visual motion discrimination in the human visual system is being developed, with application to interpretation of moving light displays. The author provides an overview of the current state of the model and discusses ideas that are being pursued. The computational model incorporates bottom-up feature computation, scenario indexing, and top-down contextual information flow. Emphasis is on the intermediate and higher-level visual processes, assuming the work of others for the low-level processes. The nature of memory structures devoted to representation and recognition of visual motion sequences is explored. >

24 citations


01 Jan 1989
TL;DR: This work first detects edges in an image, then group the edges based on the structural relationships among them, and then groupings are treated as elements in the image, and are in turn grouped again.
Abstract: Humans can easily group the elements in an image on such structural relationships as collinearity, parallelism, and symmetry. This phenomenon is called perceptual organization. To effectively use perceptual organization for computer vision, we require techniques for detecting such groupings in images, and techniques for utilizing the groupings for other stages of the visual process. In this work, we first detect edges in an image. We then group the edges based on the structural relationships among them. These groupings are treated as elements in the image, and are in turn grouped again. Thus, a hierarchy of such structural groupings is formed. The structural relationships used for the grouping are determined by the shapes of the objects being viewed. When the objects being viewed are known, we detect only those geometrical relationships which are found in those objects. To handle unknown objects, we detect the same geometrical relationships as the human visual system. We use the edge groupings to perform various visual tasks: for obtaining depth information stereooptically, for extracting from the image the objects being viewed, and for describing the shapes of the detected objects. (Copies available exclusively from Micrographics Department, Doheny Library, USC, Los Angeles, CA 90089-0182.)

18 citations


Journal ArticleDOI
TL;DR: In this article, the decomposition of visual imagery ability in subcomponents is discussed. But the authors focus on the question of whether vivid and non-vivid imagers differ on a primary data or conceptually driven processing level.
Abstract: This paper is concerned with the decomposition of visual imagery ability in subcomponents. Basically, it is assumed that visual imagery consists of several components which are relatively independent of spatial imagery components (Kosslyn, Brunn, Cave, & Wallach, 1984; Poltrock & Agnoli, 1986; Poltrock & Brown, 1984). Theoretical assumptions on individual differences in visual imagery are formulated within a framework of general information processing principles. Based on the assumption that data and conceptually driven processes are involved in visual imagery, we turned to the question of whether vivid and non-vivid imagers differ on a primary data or conceptually driven processing level. To induce primary data-driven visual processes, unfamiliar visual patterns were used (Logie, 1986; Phillips, 1983). A visual long-term memory task (Marks, 1973) should involve primary conceptually driven visual processes. The analysis of relationships between self-report measures of visual imagery ability (VVIQ...

Journal ArticleDOI
TL;DR: It is shown in this paper that a one-to-one correspondence can be established between the different stages of the scale-space coder and a well-known model of the human visual system that is based on psychophysical data.


Proceedings Article
18 Jul 1989
TL;DR: An image processing support system, IPSSENS-II, which has knowledge-bases of image processing techniques, and based on these kinds of knowledge, the system automatically generates an image processing process to satisfy the user's requirements.
Abstract: The authors have developed an image processing support system, IPSSENS-II, which has knowledge-bases of image processing techniques. With the aid of the knowledge-bases, even the user without much experience can execute image processing easily. There are two types of knowledge built in the system: the knowledge independent of the contents of a given image; and the knowledge dependent on them. The former is the knowledge of image data types and image processing algorithms, and is described in a frame structure. The latter is based on the experiences of experts on image processing, and is described in IF-THEN type rules. Based on these kinds of knowledge, the system automatically generates an image processing process to satisfy the user's requirements.

Proceedings ArticleDOI
29 Jun 1989
TL;DR: Novel features include that the compression and reconstruction address certain characteristics of the human visual system (HVS), that two-way communication controls a moving "fovea" in the transformation, and that resolution varies over the image.
Abstract: A method for bandwidth-efficient processing of video imagery to be viewed by the teleoperator of a remotely-operated vehicle on which the camera is mounted is described. The method comprises image coding, transmission, and reconstruction. It is assumed that the transmission bandpass is the limiting factor rather than encoding/decoding schemata; that image coding and reconstruction will be done within the general abilities of the NASA/TI Programmable Remapper; and that the ratio of retained local detail to the operator's visual resolution is held constant throughout the large-field image that is seen. Novel features include that the compression and reconstruction address certain characteristics of the human visual system, that two-way communication controls a moving 'fovea' in the transformation, and that resolution varies over the image. Conventional motivations accommodated include the Cartesian raster-scan nature of available imagers and display devices and a need for low bandwidth in the image transmission. Unique image processing hardware, NASA's Programmable Remapper, allows demonstration of the method. Once refined, the technology could be adapted to special purpose imagers and display devices, or otherwise to dedicated image processing hardware.

Patent
27 Jul 1989
TL;DR: In this paper, an image is scanned in an orthogonal pattern by means of a low weight; low energy consuming electronic sensing system thus providing a poratble visual aid for the patient.
Abstract: Device for converting visual images into sound sequences, particularly to be used as a visual aid for the blind, operating on fast working pipelined extended parallel operation electronics. An image is scanned in an orthogonal pattern by means of a low weight; low energy consuming electronic sensing system thus providing a poratble visual aid for the patient.

Book ChapterDOI
01 Nov 1989
TL;DR: The area of computer analysis of images for automated detection and classification of objects in a scene has been intensively researched in the recent past and a scheme to develop a scheme which a machine can use for accomplishing a particular task is developed.
Abstract: The area of computer analysis of images for automated detection and classification of objects in a scene has been intensively researched in the recent past. Two kinds of approaches may be noted in current and past research in machine perception - (1) To model the functions of biological vision systems, e.g., edge detection by the human visual system, and (2) To develop a scheme which a machine can use for accomplishing a particular task, e.g. automated detection of faulty placement of components on a printed circuit board. The latter approach produces a scheme that is application specific. In developing a scheme for a particular machine perception task one has a wide choice of sensing modalities and techniques to interpret the sensed signals. One is not limited by characteristics of a biological vision system that one is forced to emulate in the first approach, not even by the restriction that system emulate only the observed behavior of the biological system.


Book
01 Oct 1989
TL;DR: A parallel processing in the Human Visual System and an examination of the Ten Degrees of Visual Field Surrounding Fixation are studied.
Abstract: 1 Parallel Processing in the Human Visual System.- 2 Brightness Sense Testing.- 3 Critical Flicker Frequency: A New Look at an Old Test.- 4 Contrast Sensitivity Testing.- 5 New Methods in Clinical Electrophysiology.- 6 Examination of the Ten Degrees of Visual Field Surrounding Fixation.- 7 Automated Perimetry: Theoretical and Practical Considerations.

Proceedings ArticleDOI
04 Oct 1989
TL;DR: The author suggests an approach based upon data visualization and visual reasoning to transform the data objects and present sample data objects in a visual space so that the user can incrementally formulate the information retrieval request in the visual space.
Abstract: When the database grows larger and larger, the user no longer knows what is in the database. Nor does the user know clearly what should be retrieved. How to get at the data becomes a central problem for very large databases. The author suggests an approach based upon data visualization and visual reasoning. The idea is to transform the data objects and present sample data objects in a visual space. The user can then incrementally formulate the information retrieval request in the visual space. By combining data visualization, visual query, visual examples and visual clues, he hopes to come up with better ways for formulating and modifying a user's query. A prototype system using the Visual Language Compiler and the VisualNet is then described. >

01 Jan 1989
TL;DR: PARVO, a computer vision system which addresses the problem of fast and generic recognition of unexpected 3D objects from single 2D views, and is shown to successfully compute generic descriptions and then recognize many common man-made objects.
Abstract: We present PARVO, a computer vision system which addresses the problem of fast and generic recognition of unexpected 3D objects from single 2D views. After more than twenty years, the field of computer vision has still not produced any clear understanding of how this complex high-level visual capability is even possible. On the other hand, the human visual system is an existence proof that such a competence is attainable, as demonstrated informally by everyone's daily experience, and more formally, by the results of various psychological studies. Recently RBC, a new human image understanding theory, has been proposed on the basis of some of these psychological results. However, no systematic computational evaluation of its many aspects has yet been reported. Such an evaluation is essential if the theory is ever to play a role in the progress of the computer vision field. The PARVO system discussed in this thesis is a first step towards this goal since its design respects and makes explicit the main assumptions of the proposed theory. It analyses single-view 2D line drawings of 3D objects typical of the ones used in human image understanding studies. It is designed to handle partially occluded objects of different shape and dimension in various spatial orientations and locations in the image plane. The system is shown to successfully compute generic descriptions and then recognize many common man-made objects.

Proceedings ArticleDOI
23 May 1989
TL;DR: The Kodak still video transceiver system is designed to electronically transmit and receive high quality color video images over standard telephone lines and uses human visual system (HVS) sensitivity models to achieve high compression ratios without visible artifacts.
Abstract: The Kodak still video transceiver system is designed to electronically transmit and receive high quality color video images over standard telephone lines. A detailed description of the algorithm used in compressing the digital image data is provided. The algorithm is based on the discrete cosine transform (DCT) and uses human visual system (HVS) sensitivity models to achieve high compression ratios without visible artifacts.

Proceedings ArticleDOI
01 Nov 1989
TL;DR: It is widely recognized that effective image processing and machine vision must involve the use of information at multiple scales, and that models of human vision must be multi-scale as well.
Abstract: It is widely recognized that effective image processing and machine vision must involve the use of information at multiple scales, and that models of human vision must be multi-scale as well. The most commonly used image representations are linear transforms, in which an image is decomposed into a sum of elementary basis functions. Besides being well understood, linear transformations which can be expressed in terms of convolutions provide a useful model of early processing in the human visual system. The following properties are valuable for linear transforms that are to be used in image processing and vision modelling:

Book ChapterDOI
01 Jan 1989
TL;DR: This chapter is meant to provide the reader with a general background of the evidence for parallel processing in the primate visual system, to extend that evidence to the humanVisual system, and finally to speculate that selective components of these parallel pathways may be compromised in certain disease states.
Abstract: Parallel visual pathways subserving separate visual functions have been well documented in animals. Evidence of similar parallel pathways have also been demonstrated in humans. This review is not intended to be a comprehensive overview of parallel processing; a number of such reviews exist (Rowe and Stone, 1977; Lennie, 1980; Stone, 1983; Kaas, 1986; Shapley and Perry, 1986; DeYoe and Van Essen, 1988). This chapter is meant to provide the reader with a general background of the evidence for parallel processing in the primate visual system, to extend that evidence to the human visual system, and finally to speculate that selective components of these parallel pathways may be compromised in certain disease states. This review also serves to make the reader aware of the many classification schemes used in describing parallel pathways and of the tests that may be used to assess the functioning of these pathways.

18 Jul 1989
TL;DR: Models of human vision that are applicable to the prediction of subjective picture quality play an important role in an optimal use of resources in future technological developments.
Abstract: The amount of data involved in digital picture processing provides a challenge even for nowadays sophisticated computer hardware, in computational speed as well as in storage capacity required. However, modern communication and office automation concepts substantially rely on the idea of an extensive availability of picture processing and transmission facilities. Reduced resolution of signal dimensions and appropriate coding schemes thus are inevitable prerequisites of such a development. Optimal adjustment of this reduction procedures requires an appropriate measure of image quality. Common S/N measures, however, show only weak correlation with the judgements of human observers. Models of human vision that are applicable to the prediction of subjective picture quality therefore play an important role in an optimal use of resources in future technological developments.

Proceedings ArticleDOI
Gupta1
01 Jan 1989
TL;DR: A description of the connection between fuzzy logic and neural networks for the area of computer vision and the recently developed calculus of fuzzy logic along with neuron-like computational units appear to be very powerful tools for the emulation of human-like vision fields on a computer.
Abstract: Summary form only given, as follows. The emulation of human-like vision on a computer is often the desired goal of robot vision and medical image processing. Human vision possesses some important attributes such as perception and cognition. It is imperative that some aspects of these attributes be captured when emulating the human visual system. The processes of perception, mentation, and cognition imply that objects and images are not crisply perceived, and therefore the more common forms of logic such as binary cannot be used. The recently developed calculus of fuzzy logic along with neuron-like computational units appear to be very powerful tools for the emulation of human-like vision fields on a computer. A description is given of the connection between fuzzy logic and neural networks for the area of computer vision. >

Proceedings ArticleDOI
27 Mar 1989
TL;DR: The connection between fuzzy logic and neural networks for the area of computer vision is described and the recently developed calculus of fuzzy logic along with neuron-like computational units appear to be very powerful tools for the emulation of human-like vision fields on a computer.
Abstract: The emulation of human-like vision on a computer is often the desired goal of robot vision and medical image processing. Human vision possesses some important attributes such as "perception" and "cognition". It is imperative that some aspects of these attributes are captured when emulating the human visual system. The processes of perception, mentation, and cognition imply that objects and images are not crisply perceived and, therefore, the more common forms of logic such as binary cannot be used. The recently developed calculus of fuzzy logic along with neuron-like computational units appear to be very powerful tools for the emulation of human-like vision fields on a computer. In this paper, we describe the connection between fuzzy logic and neural networks for the area of computer vision.

Proceedings Article
18 Jul 1989
TL;DR: In this article, a scene adaptive VQ-scheme for 8*8 blocks is proposed, where the dimension is reduced by thresholding DCT-transformed block components and a classification in activity and energy concentration as well as an edge detector is used for setting the threshold value.
Abstract: A scene adaptive VQ-scheme for 8*8 blocks is proposed. The dimension is reduced by thresholding DCT-transformed block components. A classification in activity and energy concentration as well as an edge detector is used for setting the threshold value. Supra-threshold DCT-coefficients are quantized by several vector quantizers using gain/shape-separation and a multistage technique. Both the frequency response and the masking effect of the human visual system are utilized by the VQ.

01 Jan 1989
TL;DR: In this paper, the authors examined the interaction between pixel intensity profiles and image quality, taking into account the spatio-temporal characteristics of the human visual system, by considering the general problem; given a device with given pixel locations and intensity profiles, what is the set of intensity values that best represents a given spatiotemporal image.
Abstract: All computer-generated images are displayed by intensifying pixels on a display. Each pixel, which extends spatially and temporally, has two relevant properties: its position and its intensity profile. The interaction between position and image quality, which can produce the aliasing of high frequencies onto low ones, has been extensively explored and very successful treatments for the resulting artifacts are well known. There is little practical understanding of the interaction between intensity profile and image quality. This thesis examines the interaction between pixel intensity profiles and image quality, taking into account the spatiotemporal characteristics of the human visual system. The interaction is examined by considering the general problem; given a device with given pixel locations and intensity profiles, what is the set of intensity values that best represents a given spatiotemporal image. Within the restricted but practical case in which pixel locations are periodic in space and time and pixel intensity profiles are identical, two solution techniques are explored. One directly minimizes discrepancies between the desired image and an image generated by a display device. The other chooses pixel intensities to minimize differences in the Fourier domain, with the differences weighted by the corresponding sensitivities of the human visual system. These two techniques are explored in detail for pixels with an exponential temporal intensity profile and a Gaussian spatial profile. Each method is examined in several different norms with pixel intensities constrained and unconstrained. (Constraints are relevant because the contrast possible using unconstrained fitting is very restricted for some devices.) These results are calculated assuming temporal degrees of freedom independent of the spatial ones. Under the same assumption, the spatial intensity profiles were examined. Throughout these calculations, algorithms that manipulate circulant matrices provide a computationally effective means for determining pixel intensities. When spatial and temporal degrees of freedom are taken together these algorithms can no longer be used because the spatial and temporal responses of the human visual system are not separable. Since solutions require use of less efficient numerical methods, the emphasis in that part of the thesis is on differences between the unseparated solutions and those that are produced using separable approximations.

18 Jul 1989
TL;DR: A model for image understanding is reported which takes as its input the basic transactions involved in image enhancement and image encodement, and generates as its output recommendations as to the best approach or set of processes to analyse an image.
Abstract: Reports the development, using expert system technology, of a model for image understanding which takes as its input the basic transactions involved in image enhancement and image encodement, and generates as its output recommendations as to the best approach or set of processes to analyse an image. The paper examines the broad area of visual processing and utilises the study of human visual information processing as a starting point for the model. Inevitably, this approach draws away from the more statistical/mathematical algorithmic methods of image processing towards the fundamental aspects of psychological study and stresses the need to develop a theory for visual information processing in order to perform the complex task of image understanding.