Author

# David Marr

Other affiliations: University of Cambridge

Bio: David Marr is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Orientation (computer vision) & Human visual system model. The author has an hindex of 28, co-authored 38 publications receiving 28822 citations. Previous affiliations of David Marr include University of Cambridge.

##### Papers

More filters

••

TL;DR: The theory of edge detection explains several basic psychophysical findings, and the operation of forming oriented zero-crossing segments from the output of centre-surround ∇2G filters acting on the image forms the basis for a physiological model of simple cells.

Abstract: A theory of edge detection is presented. The analysis proceeds in two parts. (1) Intensity changes, which occur in a natural image over a wide range of scales, are detected separately at different scales. An appropriate filter for this purpose at a given scale is found to be the second derivative of a Gaussian, and it is shown that, provided some simple conditions are satisfied, these primary filters need not be orientation-dependent. Thus, intensity changes at a given scale are best detected by finding the zero values of delta 2G(x,y)*I(x,y) for image I, where G(x,y) is a two-dimensional Gaussian distribution and delta 2 is the Laplacian. The intensity changes thus discovered in each of the channels are then represented by oriented primitives called zero-crossing segments, and evidence is given that this representation is complete. (2) Intensity changes in images arise from surface discontinuities or from reflectance or illumination boundaries, and these all have the property that they are spatially. Because of this, the zero-crossing segments from the different channels are not independent, and rules are deduced for combining them into a description of the image. This description is called the raw primal sketch. The theory explains several basic psychophysical findings, and the operation of forming oriented zero-crossing segments from the output of centre-surround delta 2G filters acting on the image forms the basis for a physiological model of simple cells (see Marr & Ullman 1979).

6,893 citations

•

01 Jan 1982

TL;DR: Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field of visual perception as discussed by the authors, where the process of vision constructs a set of representations, starting from a description of the input image and culminating with three-dimensional objects in the surrounding environment, a central theme and one that has had farreaching influence in both neuroscience and cognitive science, is the notion of different levels of analysis.

Abstract: "David Marr's posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many to enter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broader questions about how the brain and its functions can be studied and understood. Researchers from a range of brain and cognitive sciences have long valued Marr's creativity, intellectual power, and ability to integrate insights and data from neuroscience, psychology, and computation. This MIT Press edition makes Marr's influential work available to a new generation of students and scientists. In Marr's framework, the process of vision constructs a set of representations, starting from a description of the input image and culminating with a description of three-dimensional objects in the surrounding environment. A central theme, and one that has had far-reaching influence in both neuroscience and cognitive science, is the notion of different levels of analysis--in Marr's framework, the computational level, the algorithmic level, and the hardware implementation level. Now, thirty years later, the main problems that occupied Marr remain fundamental open problems in the study of perception. Vision provides inspiration for the continuing efforts to integrate knowledge from cognition and computation to understand vision and the brain."--MIT CogNet.

5,482 citations

••

TL;DR: A detailed theory of cerebellar cortex is proposed whose consequence is that the cerebellum learns to perform motor skills and two forms of input—output relation are described, both consistent with the cortical theory.

Abstract: 1.
A detailed theory of cerebellar cortex is proposed whose consequence is that the cerebellum learns to perform motor skills. Two forms of input-output relation are described, both consistent with the cortical theory. One is suitable for learning movements (actions), and the other for learning to maintain posture and balance (maintenance reflexes).
2.
It is known that the cells of the inferior olive and the cerebellar Purkinje cells have a special one-to-one relationship induced by the climbing fibre input. For learning actions, it is assumed that:
(a)
each olivary cell responds to a cerebral instruction for an elemental movement. Any action has a defining representation in terms of elemental movements, and this representation has a neural expression as a sequence of firing patterns in the inferior olive; and
(b)
in the correct state of the nervous system, a Purkinje cell can initiate the elemental movement to which its corresponding olivary cell responds.
3.
Whenever an olivary cell fires, it sends an impulse (via the climbing fibre input) to its corresponding Purkinje cell. This Purkinje cell is also exposed (via the mossy fibre input) to information about the context in which its olivary cell fired; and it is shown how, during rehearsal of an action, each Purkinje cell can learn to recognize such contexts. Later, when the action has been learnt, occurrence of the context alone is enough to fire the Purkinje cell, which then causes the next elemental movement. The action thus progresses as it did during rehearsal.
4.
It is shown that an interpretation of cerebellar cortex as a structure which allows each Purkinje cell to learn a number of contexts is consistent both with the distributions of the various types of cell, and with their known excitatory or inhibitory natures. It is demonstrated that the mossy fibre-granule cell arrangement provides the required pattern discrimination capability.
5.
The following predictions are made.
(a)
The synapses from parallel fibres to Purkinje cells are facilitated by the conjunction of presynaptic and climbing fibre (or post-synaptic) activity. Reprinted with permission of The Physiological Society, Oxford, England.
(b)
No other cerebellar synapses are modifiable.
(c)
Golgi cells are driven by the greater of the inputs from their upper and lower dendritic fields.
6.
For learning maintenance reflexes, 2(a) and 2 (b) are replaced by 2’. Each olivary cell is stimulated by one or more receptors, all of whose activities are usually reduced by the results of stimulating the corresponding Purkinje cell.
7.
It is shown that if (2’) is satisfied, the circuit receptor → olivary cell → Purkinje cell → effector may be regarded as a stabilizing reflex circuit which is activated by learned mossy fibre inputs. This type of reflex has been called a learned conditional reflex, and it is shown how such reflexes can solve problems of maintaining posture and balance.
8.
5(a), and either (2) or (2’) are essential to the theory: 5(b) and 5(c) are not absolutely essential, and parts of the theory could survive the disproof of either.

3,151 citations

••

TL;DR: It is shown that rather general numerical constraints roughly determine the dimensions of memorizing models for the mammalian brain, and from these is derived a general model for archicortex.

Abstract: It is proposed that the most important characteristic of archicortex is its ability to perform a simple kind of memorizing task. It is shown that rather general numerical constraints roughly determine the dimensions of memorizing models for the mammalian brain, and from these is derived a general model for archicortex.

2,671 citations

••

TL;DR: The human visual process can be studied by examining the computational problems associated with deriving useful information from retinal images by applying the approach to the problem of representing three-dimensional shapes for the purpose of recognition.

Abstract: The human visual process can be studied by examining the computational problems associated with deriving useful information from retinal images. In this paper, we apply this approach to the problem of representing three-dimensional shapes for the purpose of recognition. 1. Three criteria, accessibility, scope and uniqueness, and stability and sensitivity, are presented for judging the usefulness of a representation for shape recognition. 2. Three aspects of a representation9s design are considered, (i) the representation9s coordinate system, (ii) its primitives, which are the primary units of shape information used in the representation, and (iii) the organization the representation imposes on the information in its descriptions. 3. In terms of these design issues and the criteria presented, a shape representation for recognition should: (i) use an object-centred coordinate system, (ii) include volumetric primitives of varied sizes, and (iii) have a modular organization. A representation based on a shape9s natural axes (for example the axes identified by a stick figure) follows directly from these choices. 4. The basic process for deriving a shape description in this representation must involve: (i) a means for identifying the natural axes of a shape in its image and (ii) a mechanism for transforming viewer-centred axis specifications to specifications in an object-centred coordinate system. 5. Shape recognition involves: (i) a collection of stored shape descriptions, and (ii) various indexes into the collection that allow a newly derived description to be associated with an appropriate stored description. The most important of these indexes allows shape recognition to proceed conservatively from the general to the specific based on the specificity of the information available from the image. 6. New constraints supplied by a conservative recognition process can be used to extract more information from the image. A relaxation process for carrying out this constraint analysis is described.

2,256 citations

##### Cited by

More filters

•

[...]

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

••

TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.

Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

28,073 citations

••

TL;DR: In this paper, it is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2 /sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions.

Abstract: Multiresolution representations are effective for analyzing the information content of images. The properties of the operator which approximates a signal at a given resolution were studied. It is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2/sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions. In L/sup 2/(R), a wavelet orthonormal basis is a family of functions which is built by dilating and translating a unique function psi (x). This decomposition defines an orthogonal multiresolution representation called a wavelet representation. It is computed with a pyramidal algorithm based on convolutions with quadrature mirror filters. Wavelet representation lies between the spatial and Fourier domains. For images, the wavelet representation differentiates several spatial orientations. The application of this representation to data compression in image coding, texture discrimination and fractal analysis is discussed. >

20,028 citations

••

TL;DR: This work uses snakes for interactive interpretation, in which user-imposed constraint forces guide the snake near features of interest, and uses scale-space continuation to enlarge the capture region surrounding a feature.

Abstract: A snake is an energy-minimizing spline guided by external constraint forces and influenced by image forces that pull it toward features such as lines and edges. Snakes are active contour models: they lock onto nearby edges, localizing them accurately. Scale-space continuation can be used to enlarge the capture region surrounding a feature. Snakes provide a unified account of a number of visual problems, including detection of edges, lines, and subjective contours; motion tracking; and stereo matching. We have used snakes successfully for interactive interpretation, in which user-imposed constraint forces guide the snake near features of interest.

18,095 citations

••

TL;DR: A model of a system having a large number of simple equivalent components, based on aspects of neurobiology but readily adapted to integrated circuits, produces a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size.

Abstract: Computational properties of use of biological organisms or to the construction of computers can emerge as collective properties of systems having a large number of simple equivalent components (or neurons). The physical meaning of content-addressable memory is described by an appropriate phase space flow of the state of a system. A model of such a system is given, based on aspects of neurobiology but readily adapted to integrated circuits. The collective properties of this model produce a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size. The algorithm for the time evolution of the state of the system is based on asynchronous parallel processing. Additional emergent collective properties include some capacity for generalization, familiarity recognition, categorization, error correction, and time sequence retention. The collective properties are only weakly sensitive to details of the modeling or the failure of individual devices.

16,652 citations