scispace - formally typeset
Search or ask a question
Book

Early processing of visual information

01 Jan 1976-
TL;DR: It is argued that "non-attentive" vision is in practice implemented by these grouping operations and first order discriminations acting on the primal sketch, and implies that such knowledge should influence the control of, rather than interfering with, the actual data-processing that is taking place lower down.
Abstract: An introduction is given to a theory of early visual information processing. The theory has been implemented, and examples are given of images at various stages of analysis. It is argued that the first step of consequence is to compute a primitive but rich description of the grey-level changes present in an image. The description is expressed in a vocabulary of kinds of intensity change (EDGE, SHADING-EDGE, EXTENDED-EDGE, LINE, BLOB etc.). Modifying parameters are bound to the elements in the description, specifying their POSITION, ORIENTATION, TERMINATION points, CONTRAST, SIZE and FUZZINESS. This description is obtained from the intensity array by fixed techniques, and it is called the primal sketch. For most images, the primal sketch is large and unwieldy. The second important step in visual information processing is to group its contents in a way that is appropriate for later recognition. From our ability to interpret drawings with little semantic content, one may infer the presence in our perceptual equipment of symbolic processes that can define "place-tokens" in an image in various ways, and can group them according to certain rules. Homomorphic techniques fail to account for many of these grouping phenomena, whose explanations require mechanisms of construction rather than mechanisms of detection. The necessary grouping of elements in the primal sketch may be achieved by a mechanism that has available the processes inferred from above, together with the ability to select items by first order discriminations acting on the elements' parameters. Only occasionally do these mechanisms use downward-flowing information about the contents of the particular image being processed. It is argued that "non-attentive" vision is in practice implemented by these grouping operations and first order discriminations acting on the primal sketch. The class of computations so obtained differs slightly from the class of second order operations on the intensity array. The extraction of a form from the primal sketch using these techniques amounts to the separation of figure from ground. It is concluded that most of the separation can be carried out by using techniques that do not depend upon the particular image in question. Therefore, figure-ground separation can normally precede the description of the shape of the extracted form. Up to this point, higher-level knowledge and purpose are brought to bear on only a few of the decisions taken during the processing. This relegates the widespread use of downward-flowing information to a later stage than is found in current machine-vision programs, and implies that such knowledge should influence the control of, rather than interfering with, the actual data-processing that is taking place lower down.
Citations
More filters
Journal ArticleDOI
TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

28,073 citations


Cites background from "Early processing of visual informat..."

  • ...This is similar to the selection criterion proposed by Marr and Hildreth [18] for choosing between different Laplacian of Gaussian channels....

    [...]

  • ...In fact, a one-dimensional Marr-Hildreth edge detector is almost identical with the operator we have derived because maxima in the output of a first derivative operator will correspond to zero-crossings in the Laplacian operator as used by Marr and Hildreth....

    [...]

  • ...The effect of the window function becomes very marked for large operator sizes and it is probably the biggest single reason why operators with large support were not practical until the work of Marr and Hildreth on the Laplacian of Gaussian....

    [...]

  • ...Directional operators very much like the ones we have derived were suggested by Marr [17], but were discarded in favor of the Laplacian of Gaussian [18]....

    [...]

  • ...The Marr-Hildreth operator does not use any form of thresholding, but an adaptive thresholding scheme can be used to advantage with our first derivative operator....

    [...]

Journal ArticleDOI
TL;DR: The theory of edge detection explains several basic psychophysical findings, and the operation of forming oriented zero-crossing segments from the output of centre-surround ∇2G filters acting on the image forms the basis for a physiological model of simple cells.
Abstract: A theory of edge detection is presented. The analysis proceeds in two parts. (1) Intensity changes, which occur in a natural image over a wide range of scales, are detected separately at different scales. An appropriate filter for this purpose at a given scale is found to be the second derivative of a Gaussian, and it is shown that, provided some simple conditions are satisfied, these primary filters need not be orientation-dependent. Thus, intensity changes at a given scale are best detected by finding the zero values of delta 2G(x,y)*I(x,y) for image I, where G(x,y) is a two-dimensional Gaussian distribution and delta 2 is the Laplacian. The intensity changes thus discovered in each of the channels are then represented by oriented primitives called zero-crossing segments, and evidence is given that this representation is complete. (2) Intensity changes in images arise from surface discontinuities or from reflectance or illumination boundaries, and these all have the property that they are spatially. Because of this, the zero-crossing segments from the different channels are not independent, and rules are deduced for combining them into a description of the image. This description is called the raw primal sketch. The theory explains several basic psychophysical findings, and the operation of forming oriented zero-crossing segments from the output of centre-surround delta 2G filters acting on the image forms the basis for a physiological model of simple cells (see Marr & Ullman 1979).

6,893 citations


Cites background or methods from "Early processing of visual informat..."

  • ...The first primitive description of the image was called the primal sketch (Marr 1976b) and it is formed in two parts....

    [...]

  • ...The model does, however, make explicit certain nonlinear features that we regard as critical, and it forms the starting point for the more complete proposal of Marr & A major difficulty with natural images is that changes can and do occur over a wide range of scales (Marr 1976a, b)....

    [...]

  • ...Following Marr (1976b), the closed contours we call BLOBS, and assign to them a length, width, orientation and (average) contrast; and the terminations are assigned a position and orientation (see figure 7c)....

    [...]

  • ...…idea is that if a simple cell truly signals either the positive or the negative part of the linear convolution of its bar-shaped receptive field with the image intensity, it can hardly be thought of as making some symbolic assertion about the presence of a bar in the image (Marr 1976~3, p. 648)....

    [...]

Book
01 Jan 1980
TL;DR: Hornstein this article discusses the Biological Basis of Language Capacities and Language and Unconscious Knowledge Notes Index (LUCI) for language and unconscious knowledge in the context of natural language processing.
Abstract: Foreword by Norbert Hornstein Preface Part I 1 Mind and Body 2 Structures, Capacities, and Conventions 3 Knowledge of Grammar 4 Some Elements of Grammar Part II 5 On the Biological Basis of Language Capacities 6 Language and Unconscious Knowledge Notes Index

2,930 citations

Journal ArticleDOI
TL;DR: The human visual process can be studied by examining the computational problems associated with deriving useful information from retinal images by applying the approach to the problem of representing three-dimensional shapes for the purpose of recognition.
Abstract: The human visual process can be studied by examining the computational problems associated with deriving useful information from retinal images. In this paper, we apply this approach to the problem of representing three-dimensional shapes for the purpose of recognition. 1. Three criteria, accessibility, scope and uniqueness, and stability and sensitivity, are presented for judging the usefulness of a representation for shape recognition. 2. Three aspects of a representation9s design are considered, (i) the representation9s coordinate system, (ii) its primitives, which are the primary units of shape information used in the representation, and (iii) the organization the representation imposes on the information in its descriptions. 3. In terms of these design issues and the criteria presented, a shape representation for recognition should: (i) use an object-centred coordinate system, (ii) include volumetric primitives of varied sizes, and (iii) have a modular organization. A representation based on a shape9s natural axes (for example the axes identified by a stick figure) follows directly from these choices. 4. The basic process for deriving a shape description in this representation must involve: (i) a means for identifying the natural axes of a shape in its image and (ii) a mechanism for transforming viewer-centred axis specifications to specifications in an object-centred coordinate system. 5. Shape recognition involves: (i) a collection of stored shape descriptions, and (ii) various indexes into the collection that allow a newly derived description to be associated with an appropriate stored description. The most important of these indexes allows shape recognition to proceed conservatively from the general to the specific based on the specificity of the information available from the image. 6. New constraints supplied by a conservative recognition process can be used to extract more information from the image. A relaxation process for carrying out this constraint analysis is described.

2,256 citations

Journal ArticleDOI
TL;DR: The results of a series of search experiments are interpreted as evidence that focused attention to single items or to groups is required to reduce background activity when the Weber fraction distinguishing the pooled feature activity with displayscontaining a target and with displays containing only distractors is too small to allow reliable discrimination.
Abstract: In this article we review some new evidence relating to early visual processing and propose an explanatory framework. A series of search experiments tested detection of targets distinguished from the distractors by differences on a single dimension. Our aim was to use the pattern of search latencies to infer which features are coded automatically in early vision. For each of 12 different dimensions, one or more pairs of contrasting stimuli were tested. Each member of a pair played the role of target in one condition and the role of distractor in the other condition. Many pairs gave rise to a marked asymmetry in search latencies, such that one stimulus in the pair was detected either through parallel processing or with small increases in latency as display size increased, whereas the other gave search functions that increased much more steeply. Targets denned by larger values on the quantitative dimensions of length, number, and contrast, by line curvature, by misaligned orientation, and by values that deviated from a standard or prototypical color or shape were detected easily, whereas targets defined by smaller values on the quantitative dimensions, by straightness, by frame-aligned orientation, and by prototypical colors or shapes required slow and apparently serial search. These values appear to be coded by default, as the absence of the contrasting values. We found no feature of line arrangements that allowed automatic, preattentive detection; nor did connectedness or containment—the two examples of topological features that we tested. We interpret the results as evidence that focused attention to single items or to groups is required to reduce background activity when the Weber fraction distinguishing the pooled feature activity with displays containing a target and with displays containing only distractors is too small to allow reliable discrimination.

2,240 citations


Cites background from "Early processing of visual informat..."

  • ...It is possible that the underlying discrimination of joined versus separate lines is based simply on the number of line ends, four for the separate lines and only two for the angles (cf. Julesz, 1981; Marr, 1976; Treisman & Souther, 1985)....

    [...]

  • ...Marr (1982) distinguished the goal of early vision—to form a description of the three-dimensional surfaces around us—from that of later vision—to identify or recognize objects and their settings....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This method is used to examine receptive fields of a more complex type and to make additional observations on binocular interaction and this approach is necessary in order to understand the behaviour of individual cells, but it fails to deal with the problem of the relationship of one cell to its neighbours.
Abstract: What chiefly distinguishes cerebral cortex from other parts of the central nervous system is the great diversity of its cell types and interconnexions. It would be astonishing if such a structure did not profoundly modify the response patterns of fibres coming into it. In the cat's visual cortex, the receptive field arrangements of single cells suggest that there is indeed a degree of complexity far exceeding anything yet seen at lower levels in the visual system. In a previous paper we described receptive fields of single cortical cells, observing responses to spots of light shone on one or both retinas (Hubel & Wiesel, 1959). In the present work this method is used to examine receptive fields of a more complex type (Part I) and to make additional observations on binocular interaction (Part II). This approach is necessary in order to understand the behaviour of individual cells, but it fails to deal with the problem of the relationship of one cell to its neighbours. In the past, the technique of recording evoked slow waves has been used with great success in studies of functional anatomy. It was employed by Talbot & Marshall (1941) and by Thompson, Woolsey & Talbot (1950) for mapping out the visual cortex in the rabbit, cat, and monkey. Daniel & Whitteiidge (1959) have recently extended this work in the primate. Most of our present knowledge of retinotopic projections, binocular overlap, and the second visual area is based on these investigations. Yet the method of evoked potentials is valuable mainly for detecting behaviour common to large populations of neighbouring cells; it cannot differentiate functionally between areas of cortex smaller than about 1 mm2. To overcome this difficulty a method has in recent years been developed for studying cells separately or in small groups during long micro-electrode penetrations through nervous tissue. Responses are correlated with cell location by reconstructing the electrode tracks from histological material. These techniques have been applied to

12,923 citations

Journal ArticleDOI
TL;DR: It is shown that rather general numerical constraints roughly determine the dimensions of memorizing models for the mammalian brain, and from these is derived a general model for archicortex.
Abstract: It is proposed that the most important characteristic of archicortex is its ability to perform a simple kind of memorizing task. It is shown that rather general numerical constraints roughly determine the dimensions of memorizing models for the mammalian brain, and from these is derived a general model for archicortex.

2,671 citations

Journal ArticleDOI
TL;DR: Simple sets of parallel operations are described which can be used to detect texture edges, "spots," and "streaks" in digitized pictures and it is shown that a composite output is constructed in which edges between differently textured regions are detected, and isolated objects are also detected, but the objects composing the textures are ignored.
Abstract: Simple sets of parallel operations are described which can be used to detect texture edges, "spots," and "streaks" in digitized pictures. It is shown that, by comparing the outputs of the operations corresponding to (e.g.,) edges of different sizes, one can construct a composite output in which edges between differently textured regions are detected, and isolated objects are also detected, but the objects composing the textures are ignored. Relationships between this class of picture processing operations and the Gestalt psychologists' laws of pictorial pattern organization are also discussed.

811 citations

Journal ArticleDOI
TL;DR: Unitary responses to sinusoidal gratings either moving or alternating in phase have been investigated in the optic tract, lateral geniculate body and visual cortex of the cat as a function of the spatial frequency, position of the grating with respect to the cell receptive field and grating contrast.

658 citations