scispace - formally typeset
Search or ask a question
Book ChapterDOI

Computational Scene Analysis

01 Jan 2007-pp 163-191
TL;DR: It is pointed out that the time dimension and David Marr's framework for understanding perception are essential for computational scene analysis, particularly visual and auditory scene analysis.
Abstract: A remarkable achievement of the perceptual system is its scene analysis capability, which involves two basic perceptual processes: the segmentation of a scene into a set of coherent patterns (objects) and the recognition of memorized ones. Although the perceptual system performs scene analysis with apparent ease, computational scene analysis remains a tremendous challenge as foreseen by Frank Rosenblatt. This chapter discusses scene analysis in the field of computational intelligence, particularly visual and auditory scene analysis. The chapter first addresses the question of the goal of computational scene analysis. A main reason why scene analysis is difficult in computational intelligence is the binding problem, which refers to how a collection of features comprising an object in a scene is represented in a neural network. In this context, temporal correlation theory is introduced as a biologically plausible representation for addressing the binding problem. The LEGION network lays a computational foundation for oscillatory correlation, which is a special form of temporal correlation. Recent results on visual and auditory scene analysis are described in the oscillatory correlation framework, with emphasis on real-world scenes. Also discussed are the issues of attention, feature-based versus model-based analysis, and representation versus learning. Finally, the chapter points out that the time dimension and David Marr's framework for understanding perception are essential for computational scene analysis.

Summary (3 min read)

1 Introduction

  • Human intelligence can be broadly divided into three aspects: Perception, reasoning, and action.
  • Section 3 is devoted to a key problem in scene analysis - the binding problem, which concerns how sensory elements are organized into percepts in the brain.
  • Section 4 describes oscillatory correlation theory as a biologically plausible representation to address the binding problem.
  • In Section 7, I discuss a number of challenging issues facing computational scene analysis.

2 What is the Goal of Computational Scene Analysis?

  • In his monumental book on computational vision, Marr makes a compelling case that understanding perceptual information processing requires three different levels of description.
  • The second level, called representation and algorithm, is concerned with the representation of the input and the output, and the algorithm that transforms from the input representation to the output representation.
  • Before addressing this question, let us ask the question of what purpose perception serves.
  • The above goal of computational scene analysis is strongly related to the goal of human scene analysis.
  • In particular, the authors assume the input format to be similar in both cases.

3 Binding Problem and Temporal Correlation Theory

  • The ability to group sensory elements of a scene into coherent objects, often known as perceptual organization or perceptual grouping [40], is a fundamental part of perception.
  • How perceptual organization is achieved in the brain remains a mystery.
  • The authors should note that object-level attributes, such as shape and size, are undefined before the more fundamental problem of figure-ground separation is solved.
  • The correlation theory asserts that the temporal structure of a neuronal signal provides the neural basis for correlation, which in turn serves to bind neuronal responses.
  • Eventually, individual objects are coded by individual neurons, and for this reason hierarchical coding is also known as the cardinal cell (or grandmother cell) representation [3].

4 Oscillatory Correlation Theory

  • A special form of temporal correlation - oscillatory correlation [52] - has been studied extensively.
  • Second, it can desynchronize different assemblies of oscillators that are activated by multiple, simultaneously present objects.
  • Within each of the two phases the oscillator exhibits slow-varying behavior.
  • Rosenblatt’s perceptrons [46, 47] are classification networks.
  • As shown in the figure, the connectedness predicate is correctly computed beyond a beginning period that corresponds to the process of assembly formation.

A. An input image with 30x30 binary pixels showing a connected cup figure. B.

  • A snapshot from corresponding LEGION network showing the initial conditions of the network.
  • C. A subsequent snapshot of the network activity.
  • The threshold is indicated by the dash line.
  • I. The upper three traces show the temporal activities for the three assemblies representing the three connected patterns in the disconnected ‘CUP’ image, the next-to-bottom trace the activity of the global inhibitor, and the bottom one the ratio of the global inhibitor’s frequency to that of enabled oscillators along with.
  • The oscillatory correlation theory provides a general framework to address the computational scene analysis problem.

5 Visual Scene Analysis

  • For computational scene analysis, some measure of similarity between features is necessary.
  • Elements with similar attributes, such as color, depth, or texture, tend to group.
  • As a result, such segmentation gives rise to the notion of a segmentation capacity [69] - at least for networks of relaxation oscillators with a non-instantaneous active phase - that refers to a limited number of oscillator assemblies that may be formed.
  • Cesmeli and Wang [8] applied LEGION to motion-based segmentation that considers motion as well as intensity for analyzing image sequences (see also [75]).
  • A frame of a motion sequence is shown in Fig. 6A, where a motorcycle rider jumps to a dry canal with his motorcycle while the camera is tracking him.

6 Auditory Scene Analysis

  • Frequency components that have common temporal modulation tend to be grouped together.
  • Their model relies on global connectivity to achieve synchronization among the oscillators that are stimulated at the same time.
  • The second layer groups the segments that emerge from the first layer.
  • Their model first performs peripheral processing and then auditory segmentation.

Hz to 5 kHz is employed in peripheral processing. B. A snapshot of the grouping layer. Here, white pixels denote active oscillators that represent the segregated

  • C. Another snapshot showing the segregated background.
  • At a conceptual level, a major difference between this model and Wang’s model [63] concerns whether attention can be directed to more than one stream:.
  • In the Wrigley and Brown model only one stream may be attended to at a time whereas in Wang’s model attention may be divided by more than one stream.
  • This issue will be revisited in Sect. 7.1.

7.1 Attention

  • The importance of attention for scene analysis can hardly be overstated.
  • The difficulty is illustrated by the finding of Field et al. [18] that a path of curvilinearly aligned (snake-like) orientation elements embedded in a background of randomly oriented elements can be readily detected by observers, whereas other paths cannot.
  • 70, 76], capacity limitation is a fundamental property of attention.
  • Attention can be either goal-driven or stimulus-driven [73].
  • Visual feature dimensions include luminance, color, orientation, motion, and depth.

7.2 Feature-based Analysis versus Model-based Analysis

  • Scene analysis can be performed on the basis of the features of the objects in the input scene or the models of the objects in the memory.
  • What’s at issue is how much model-based analysis contributes to scene analysis, or whether binding should be part of a recognition process.
  • The forward path performs pattern recognition that is robust to a range of variations in position and size, and the last layer stores learned patterns.
  • A later model along a similar line was proposed by Riesenhuber and Poggio [44], and it uses a hierarchical architecture similar to the neocognitron.
  • This point is illustrated in Figure 9 which shows two frogs in a pond.

7.3 Learning versus Representation

  • Learning - both supervised and unsupervised - is central to neural networks (and computational intelligence in general).
  • The failure of perceptrons to solve this problem is rooted in the lack of a proper representation, not the lack of a powerful learning method.
  • The emphasis on representations contrasts that on learning.
  • The cepstral representation3 separates voice excitation from vocal tract filtering [22], and the discovery of this representation pays a huge dividend to speech processing tasks including automatic speech recognition where cepstral features are an indispensable part of any state-of-the-art system.
  • The above discussion makes it plain that the investigation of computational scene analysis can be characterized in large part as the pursuit of appropriate representations.

8 Concluding Remarks

  • In this chapter I have made an effort to define the goal of computational scene analysis explicitly.
  • Advances in understanding oscillatory dynamics lead to the development of the oscillatory correlation approach to computational scene analysis with promising results.
  • Natural intelligence ranges from sensation, perceptual organization, language, motor control, to decision making and long-term planning.
  • Temporal structure is shared by neuronal responses in all parts of the brain, and the time dimension is flexible and infinitely extensible.
  • The bewildering complexity of perception makes it necessary to adopt a compass to guide the way forward and avoid many pitfalls along the way.

Did you find this useful? Give us your feedback

Figures (10)

Content maybe subject to copyright    Report

Computational Scene Analysis
DeLiang Wang
Department of Computer Science & Engineering and Center for Cognitive Science
The Ohio State University
Columbus, OH 43210-1277, U.S.A.
dwang@cse.ohio-state.edu
Summary. A remarkable achievement of the perceptual system is its scene anal-
ysis capability, which involves two basic perceptual processes: the segmentation of
a scene into a set of coherent patterns (objects) and the recognition of memorized
ones. Although the perceptual system performs scene analysis with apparent ease,
computational scene analysis remains a tremendous challenge as foreseen by Frank
Rosenblatt. This chapter discusses scene analysis in the field of computational intel-
ligence, particularly visual and auditory scene analysis. The chapter first addresses
the question of the goal of computational scene analysis. A main reason why scene
analysis is difficult in computational intelligence is the binding problem, which refers
to how a collection of features comprising an object in a scene is represented in a
neural network. In this context, temporal correlation theory is introduced as a bio-
logically plausible representation for addressing the binding problem. The LEGION
network lays a computational foundation for oscillatory correlation, which is a special
form of temporal correlation. Recent results on visual and auditory scene analysis
are described in the oscillatory correlation framework, with emphasis on real-world
scenes. Also discussed are the issues of attention, feature-based versus model-based
analysis, and representation versus learning. Finally, the chapter points out that
the time dimension and David Marr’s framework for understanding perception are
essential for computational scene analysis.
1 Introduction
Human intelligence can b e broadly divided into three asp ects: Perception,
reasoning, and action. The first is mainly concerned with analyzing the in-
formation in the environment gathered by the five senses, and the last is
primarily concerned with acting on the environment. In other words, percep-
tion and action are about input and output, respectively, from the viewpoint of
the intelligent agent (i.e. a human being). Reasoning involves higher cognitive
functions such as memory, planning, language understanding, and decision
From Challenges for Computational Intelligence, W. Duch and J. Mandziuk
(Eds.), Springer, Berlin, 2007, pp. 163–191.

164 DeLiang Wang
making, and is at the core of traditional artificial intelligence [49]. Reasoning
also serves to connect perception and action, and the three aspects interact
with one another to form the whole of intelligence.
This chapter is about perception - we are concerned with how to analyze
the perceptual input, particularly in the visual and auditory domains. Be-
cause perception seeks to describe the physical world, or scenes with objects
located in physical space, perceptual analysis is also known as scene analy-
sis. To differentiate scene analysis by humans and by machines, we term the
latter computational scene analysis
1
. In this chapter I focus on the analy-
sis of a scene into its constituent objects and their spatial positions, not the
recognition of memorized objects. Pattern recognition has been much stud-
ied in computational intelligence, and is treated extensively elsewhere in this
collection.
Although humans, and nonhuman animals, perform scene analysis with
apparent ease, computational scene analysis remains an extremely challenging
problem despite decades of research in fields such as computer vision and
speech processing. The difficulty was recognized by Frank Rosenblatt in his
1962 classic book, “Principles of neurodynamics” [47]. In the last chapter,
he summarized a list of challenges facing perceptrons at the time, and two
problems in the list “represent the most baffling impediments to the advance
of perceptron theory” (p. 580). The two problems are figure-ground separation
and the recognition of topological relations. The field of neural networks has
since made great strides, particularly in understanding supervised learning
procedures for training multilayer and recurrent networks [2, 48]. However,
progress has been slow in addressing Rosenblatt’s two chief problems, largely
validating his foresight.
Rosenblatt’s first problem concerns how to separate a figure from its back-
ground in a scene, and is closely related to the problem of scene segregation:
To decompose a scene into its comprising objects. The second problem con-
cerns how to compute spatial relations between objects in a scene. Since the
second problem presupposes a solution to the first, figure-ground separation is
a more fundamental issue. Both are central problems of computational scene
analysis.
In the next section I discuss the goal of computational scene analysis.
Section 3 is devoted to a key problem in scene analysis - the binding prob-
lem, which concerns how sensory elements are organized into percepts in the
brain. Section 4 describes oscillatory correlation theory as a biologically plau-
sible representation to address the binding problem. The section also reviews
the LEGION
2
network that achieves rapid synchronization and desynchro-
nization, hence providing a computational foundation for the oscillatory cor-
relation theory. The following two sections describe visual and auditory scene
1
This is consistent with the use of the term Computational Intelligence.
2
LEGION stands for Lo cally Excitatory Globally Inhibitory Oscillator Network
[68].

Computational Scene Analysis 165
analysis separately. In Section 7, I discuss a number of challenging issues facing
computational scene analysis. Finally, Section 8 concludes the chapter.
Note that this chapter does not attempt to survey the large body of liter-
ature on computational scene analysis. Rather, it highlights a few topics that
I consider to be most relevant to this book.
2 What is the Goal of Computational Scene Analysis?
In his monumental book on computational vision, Marr makes a compelling
case that understanding perceptual information processing requires three dif-
ferent levels of description. The first level of description, called computational
theory, is mainly concerned with the goal of computation. The second level,
called representation and algorithm, is concerned with the representation of
the input and the output, and the algorithm that transforms from the input
representation to the output representation. The third level, called hardware
implementation, is concerned with how to physically realize the representation
and the algorithm.
So, what is the goal of computational scene analysis? Before addressing this
question, let us ask the question of what purpose perception serves. Answers to
this question have been attempted by philosophers and psychologists for ages.
From the information processing perspective, Gibson [21] considers perception
as the way of seeking and gathering information about the environment from
the sensory input. On visual perception, Marr [30] considers that its purpose is
to produce a visual description of the environment for the viewer. On auditory
scene analysis, Bregman states that its goal is to produce separate streams
from the auditory input, where each stream represents a sound source in the
acoustic environment [6]. It is worth emphasizing that the above views suggest
that perception is a private process of the perceiver even though the physical
environment may be common to different perceivers.
In this context, we may state that the goal of computational scene analysis
is to produce a computational description of the objects and their spatial loca-
tions in a physical scene from sensory input. The term ‘object’ here is used in
a modality-neutral way: An object may refer to an image, a sound, a smell,
and so on. In the visual domain, sensory input comprises two retinal images,
and in the auditory domain it comprises two eardrum vibrations. Thus, the
goal of visual scene analysis is to extract visual objects and their locations
from one or two images. Likewise, the goal of auditory scene analysis is to
extract streams from one or two audio recordings.
The above goal of computational scene analysis is strongly related to the
goal of human scene analysis. In particular, we assume the input format to
be similar in both cases. This assumption makes the problem well defined
and has an important consequence: It makes the research in computational
scene analysis perceptually relevant. In other words, progress in computa-
tional scene analysis may shed light on perceptual and neural mechanisms.

166 DeLiang Wang
This restricted scope also differentiates computational scene analysis from en-
gineering problem solving, where a variety and a number of sensors may be
used.
With common sensory input, we further propose that computational scene
analysis should aim to achieve human level performance. Moreover, we do not
consider the problem solved until a machine system achieves human level
performance in all perceptual environments. That is, computational scene
analysis should aim for the versatile functions of human perception, rather
than its utilities in restricted domains.
3 Binding Problem and Temporal Correlation Theory
The ability to group sensory elements of a scene into coherent objects, often
known as perceptual organization or perceptual grouping [40], is a funda-
mental part of perception. Perceptual organization takes place so rapidly and
effortlessly that it is often taken for granted by us the perceivers. The diffi-
culty of this task was not fully appreciated until effort in computational scene
analysis started in earnest. How p erceptual organization is achieved in the
brain remains a mystery.
Early processing in the perceptual system clearly involves detection of
local features, such as color, orientation, and motion in the visual system, and
frequency and onset in the auditory system. Hence, a closely related question
to perceptual organization is how the responses of feature-detecting neurons
are bound together in the brain to form a perceived scene? This is the well-
known binding problem. At the core of the binding problem is that sensory
input contains multiple objects simultaneously and, as a result, the issue of
which features should bind with which others must be resolved in objection
formation. I illustrate the situation with two objects - a triangle and a square
- at two different locations: The triangle is at the top and the square is at the
bottom. This layout, shown in Figure 1, was discussed by Rosenblatt [47] and
used as an instance of the binding problem by von der Malsburg [60]. Given
feature detectors that respond to triangle, square, top, and bottom, how can
the nervous system bind the locations and the shapes so as to perceive that
the triangle is at the top and the square is at the bottom (correctly), rather
than the square is on top and the triangle is on bottom (incorrectly)? We
should note that object-level attributes, such as shape and size, are undefined
before the more fundamental problem of figure-ground separation is solved.
Hence, I will refer to the binding of local features to form a perceived object,
or a percept, when discussing the binding problem.
How does the brain solve the binding problem? Concerned with shape
recognition in the context of multiple objects, Milner [32] suggested that dif-
ferent objects could be separated in time, leading to synchronization of firing
activity within the neurons activated by the same object. Later von der Mals-
burg [59] proposed a correlation theory to address the binding problem. The

Computational Scene Analysis 167
?
Feature Detectors
Input
Fig. 1. Illustration of the binding problem. The input consists of a triangle and a
square. There are four feature detectors for triangle, square, top, and bottom. The
binding problem concerns whether the triangle is on top (and the square at bottom)
or the square is on top (and the triangle at bottom).
correlation theory asserts that the temporal structure of a neuronal signal
provides the neural basis for correlation, which in turn serves to bind neu-
ronal responses. In a subsequent paper, von der Malsburg and Schneider [61]
demonstrated the temporal correlation theory in a neural model for segre-
gating two auditory stimuli based on their distinct onset times - an example
of auditory scene analysis that I will come back to in Section 6. This paper
proposed, for the first time, to use neural oscillators to solve a figure-ground
separation task, whereby correlation is realized by synchrony and desynchrony
among neural oscillations. Note that the temporal correlation theory is a the-
ory of representation, concerned with how different objects are represented in
a neural network, not a computational algorithm; that is, the theory does not
address how multiple objects in the input scene are transformed into multi-
ple cell assemblies with different time structures. This is a key computational
issue I will address in the next section.
The main alternative to the temporal correlation theory is the hierarchi-
cal coding hypothesis, which asserts that binding o ccurs through individual
neurons that are arranged in some cortical hierarchy so that neurons higher
in the hierarchy respond to larger and more specialized parts of an object.
Eventually, individual objects are coded by individual neurons, and for this
reason hierarchical coding is also known as the cardinal cell (or grandmother
cell) representation [3]. Gray [23] presented biological evidence for and against
the hierarchical representation. From the computational standpoint, the hier-

Citations
More filters
Journal ArticleDOI
Chun Liu1, Beiqi Shi1, Xuan Yang, Nan Li1, Hangbin Wu1 
TL;DR: A modified LEGION segmentation to extract buildings from high-quality digital surface models (DSMs) without assumptions on the underlying structures in the DSM data and without prior knowledge of the number of regions is developed.
Abstract: Researchers have extensively applied Locally Excitatory Globally Inhibitory Oscillator Networks (LEGION) for segmentation. These networks are neural oscillator networks based on biological frameworks, in which each oscillator has excitatory lateral connections to the oscillators in its local neighbourhood, as well as a connection to a global inhibitor. In this paper, we develop a modified LEGION segmentation to extract buildings from high-quality digital surface models (DSMs). The extraction is implemented without assumptions on the underlying structures in the DSM data and without prior knowledge of the number of regions. For complex information hidden in the generated DSM of an urban area, grey level co-occurrence matrix homogeneity is used to measure DSM height texture. We then use this homogeneity to distinguish buildings from trees and identify major oscillator blocks in target buildings, instead of using lateral potential. To segment pixels into different groups, we calculate the weight of the global inhibitor (Wz) from DSM complexity. Building boundaries are traced and regularised after extraction from the segmented DSM. A least squares solution with perpendicular constraints for determining regularised rectilinear building boundaries is proposed, and arc line fitting is performed. This paper presents the concept, algorithms, and procedures of the proposed approach. Experimental results on the Vaihingen region studied in the ISPRS test project are also discussed.

30 citations

Journal ArticleDOI
TL;DR: This special issue covers novel advances in scene-analysis research obtained using a combination of psychophysics, computational modelling, neuroimaging and neurophysiology, and presents new empirical and theoretical approaches.
Abstract: We perceive the world as stable and composed of discrete objects even though auditory and visual inputs are often ambiguous owing to spatial and temporal occluders and changes in the conditions of observation. This raises important questions regarding where and how 'scene analysis' is performed in the brain. Recent advances from both auditory and visual research suggest that the brain does not simply process the incoming scene properties. Rather, top-down processes such as attention, expectations and prior knowledge facilitate scene perception. Thus, scene analysis is linked not only with the extraction of stimulus features and formation and selection of perceptual objects, but also with selective attention, perceptual binding and awareness. This special issue covers novel advances in scene-analysis research obtained using a combination of psychophysics, computational modelling, neuroimaging and neurophysiology, and presents new empirical and theoretical approaches. For integrative understanding of scene analysis beyond and across sensory modalities, we provide a collection of 15 articles that enable comparison and integration of recent findings in auditory and visual scene analysis.This article is part of the themed issue 'Auditory and visual scene analysis'.

24 citations

Journal ArticleDOI
Beatrix Emo1
TL;DR: A new type of area of interest (AOI) termed “choice zones” is proposed, that is relevant for eye tracking research in the built environment, and is defined algorithmically using space-geometric parameters.
Abstract: A new type of area of interest (AOI) termed “choice zones” is proposed, that is relevant for eye tracking research in the built environment. Choice zones are an ex ante measure; this is in contrast to many existing definitions of AOIs which are data-driven. Choice zones are defined algorithmically using space-geometric parameters. The validity of the concept is tested against fixation data from an urban navigation experiment in which participants chose between alternative paths. Findings show that choice zones account for 90% of the fixations clusters. The merit of the measure for applied studies in built environment research is discussed.

11 citations


Cites methods from "Computational Scene Analysis"

  • ...Each parameter is described here; note that it is possible to compute these parameters automatically using computational scene analysis (see Wang (2007) for an overview)....

    [...]

Journal ArticleDOI
TL;DR: Experimental evidence from the listening tests indicated substantial improvements in intelligibility over that attained by human listeners with unprocessed stimuli.
Abstract: Whispered speech can be effectively used for quiet and private communications over mobile phones. It is also the communication means of laryngectomized patients under a regime of voice rest. However, little progress has been made on the enhancement of whispered speech because of its special acoustic characteristics. Recent studies with normal-hearing listeners have reported large gains in speech intelligibility with the binary mask approach. This method retains the time-frequency (T-F) units of the mixture signal that are stronger than the interfering noise (masker) and removes the T-F units where the interfering noise dominates. In this paper, a supervised learning method to enhance whispered speech is introduced. A binary mask estimated by a two-class SVM classifier is used to synthesize the enhanced whisper. Amplitude modulation spectrum (AMS) and frequency modulation spectrum (FMS) are extracted as input to SVM. Speech corrupted at low signal to noise (SNR) levels with different types of maskers is enhanced by this method and presented to normal-hearing listeners for word identification. Experimental evidence from the listening tests indicated substantial improvements in intelligibility over that attained by human listeners with unprocessed stimuli.

8 citations

16 Dec 2014
TL;DR: This dissertation initially presents an analysis of the current trends in all three research domains, and then elaborates on the methodology that was followed to realize the intended scenario, to investigate the possibility of representing each performer of a dispersed NMP ensemble by a local computer-based musician.
Abstract: The general scope of this work is to investigate potential benefits of Networked Music Performance (NMP) systems by employing techniques commonly found in Machine Musicianship. Machine Musicianship is a research area aiming at developing software systems exhibiting some musical skill such as listening, composing or performing music. A distinct track of this research line, mostly relevant to this work, is computer accompaniment systems. Such systems are expected to accompany human musicians by causally analysing the music being performed and timely responding by synthesizing an accompaniment, or the part of one or more of the remaining members of a performance ensemble. The objective of the present work is to investigate the possibility of representing each performer of a dispersed NMP ensemble, by a local computer-based musician, which constantly listens to the local performance, receives network notifications from remote locations and re-synthesizes the performance of remote peers. Whenever a new musical construct is recognized at the location of each performer, a code representing that construct is communicated to all of the remaining musicians, as low-bandwidth information. Upon reception, the remote audio signal is re-synthesized by splicing pre-recorded audio segments corresponding to the musical construct identified by the received code. Computer accompaniment systems may use any conventional audio synthesis technique to generate the accompaniment. In this work, investigations focus on concatenative music synthesis, in an attempt to preserve all expressive nuances introduced by the interpretation of individual performers. Hence, the research carried out and presented in this dissertation lies on the intersection of three domains, which are NMP, Machine Musicianship and Concatenative Music Synthesis. The dissertation initially presents an analysis of the current trends in all three research domains, and then elaborates on the methodology that was followed to realize the intended scenario. Research efforts have led to the development of BoogieNet, a preliminary software prototype implementing the proposed communication scheme for networked musical interactions. Real-time music analysis is achieved by means of audio-to-score alignment techniques and re-synthesis at the receiving end takes place by concatenating pre-recorded and automatically segmented audio units, generated by means of onset detection algorithms. The methodology of the entire process is presented and contrasted with competing analysis/synthesis techniques. Finally, the dissertation presents important implementation details and an experimental evaluation to demonstrate the feasibility of the proposed approach.

7 citations

References
More filters
Book
01 Jan 2020
TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.
Abstract: The long-anticipated revision of this #1 selling book offers the most comprehensive, state of the art introduction to the theory and practice of artificial intelligence for modern applications. Intelligent Agents. Solving Problems by Searching. Informed Search Methods. Game Playing. Agents that Reason Logically. First-order Logic. Building a Knowledge Base. Inference in First-Order Logic. Logical Reasoning Systems. Practical Planning. Planning and Acting. Uncertainty. Probabilistic Reasoning Systems. Making Simple Decisions. Making Complex Decisions. Learning from Observations. Learning with Neural Networks. Reinforcement Learning. Knowledge in Learning. Agents that Communicate. Practical Communication in English. Perception. Robotics. For computer professionals, linguists, and cognitive scientists interested in artificial intelligence.

16,983 citations

Journal ArticleDOI
TL;DR: A new hypothesis about the role of focused attention is proposed, which offers a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.

11,452 citations


"Computational Scene Analysis" refers methods in this paper

  • ...According to the popular feature integration theory of Treisman and Gelade [57], the visual system first analyzes a scene in parallel by separate retinotopic feature maps...

    [...]

  • ...According to the popular feature integration theory of Treisman and Gelade [57], the visual system first analyzes a scene in parallel by separate retinotopic feature maps and focal attention then integrates the analyses within different feature maps to produce a coherent percept....

    [...]

Journal ArticleDOI
TL;DR: This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory.
Abstract: The first of these questions is in the province of sensory physiology, and is the only one for which appreciable understanding has been achieved. This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory. With regard to the second question, two alternative positions have been maintained. The first suggests that storage of sensory information is in the form of coded representations or images, with some sort of one-to-one mapping between the sensory stimulus

8,434 citations


Additional excerpts

  • ...Rosenblatt’s perceptrons [46, 47] are classification networks....

    [...]

Book
01 Jan 1988
TL;DR: The second and third questions are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory as mentioned in this paper.
Abstract: The first of these questions is in the province of sensory physiology, and is the only one for which appreciable understanding has been achieved. This article will be concerned primarily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently supplied by neurophysiology have not yet been integrated into an acceptable theory. With regard to the second question, two alternative positions have been maintained. The first suggests that storage of sensory information is in the form of coded representations or images, with some sort of one-to-one mapping between the sensory stimulus

8,134 citations

Book
01 Jan 1966

6,307 citations

Frequently Asked Questions (8)
Q1. What have the authors contributed in "Computational scene analysis" ?

In this context, temporal correlation theory is introduced as a biologically plausible representation for addressing the binding problem. 

In my view, the Marrian framework for computational perception provides the most promising roadmap for understanding scene analysis. 

A large number of studies have applied the oscillatory correlation approach to visual scene analysis tasks, including segmentation of range and texture images, extraction of object contours, and selection of salient objects. 

The failure of perceptrons to solve this problem is rooted in the lack of a proper representation, not the lack of a powerful learning method. 

The cause, as discussed in Sect. 4, is computational complexity - learning the connectedness predicate would require far too many training samples and too much learning time. 

Displaying the acoustic input in a 2-D time-frequency (T-F) representation such as a spectrogram, major grouping principles for auditory scene analysis (ASA) are given below [6, 13]:• Proximity in frequency and time. 

For those who are concerned with biological plausibility, the speed of human scene analysis has strong implications on the kind of processing employed. 

This pattern of connectivity within the grouping layer promotes synchronization among a group of segments that have common periodicity.