Computational Scene Analysis
Summary (3 min read)
1 Introduction
- Human intelligence can be broadly divided into three aspects: Perception, reasoning, and action.
- Section 3 is devoted to a key problem in scene analysis - the binding problem, which concerns how sensory elements are organized into percepts in the brain.
- Section 4 describes oscillatory correlation theory as a biologically plausible representation to address the binding problem.
- In Section 7, I discuss a number of challenging issues facing computational scene analysis.
2 What is the Goal of Computational Scene Analysis?
- In his monumental book on computational vision, Marr makes a compelling case that understanding perceptual information processing requires three different levels of description.
- The second level, called representation and algorithm, is concerned with the representation of the input and the output, and the algorithm that transforms from the input representation to the output representation.
- Before addressing this question, let us ask the question of what purpose perception serves.
- The above goal of computational scene analysis is strongly related to the goal of human scene analysis.
- In particular, the authors assume the input format to be similar in both cases.
3 Binding Problem and Temporal Correlation Theory
- The ability to group sensory elements of a scene into coherent objects, often known as perceptual organization or perceptual grouping [40], is a fundamental part of perception.
- How perceptual organization is achieved in the brain remains a mystery.
- The authors should note that object-level attributes, such as shape and size, are undefined before the more fundamental problem of figure-ground separation is solved.
- The correlation theory asserts that the temporal structure of a neuronal signal provides the neural basis for correlation, which in turn serves to bind neuronal responses.
- Eventually, individual objects are coded by individual neurons, and for this reason hierarchical coding is also known as the cardinal cell (or grandmother cell) representation [3].
4 Oscillatory Correlation Theory
- A special form of temporal correlation - oscillatory correlation [52] - has been studied extensively.
- Second, it can desynchronize different assemblies of oscillators that are activated by multiple, simultaneously present objects.
- Within each of the two phases the oscillator exhibits slow-varying behavior.
- Rosenblatt’s perceptrons [46, 47] are classification networks.
- As shown in the figure, the connectedness predicate is correctly computed beyond a beginning period that corresponds to the process of assembly formation.
A. An input image with 30x30 binary pixels showing a connected cup figure. B.
- A snapshot from corresponding LEGION network showing the initial conditions of the network.
- C. A subsequent snapshot of the network activity.
- The threshold is indicated by the dash line.
- I. The upper three traces show the temporal activities for the three assemblies representing the three connected patterns in the disconnected ‘CUP’ image, the next-to-bottom trace the activity of the global inhibitor, and the bottom one the ratio of the global inhibitor’s frequency to that of enabled oscillators along with.
- The oscillatory correlation theory provides a general framework to address the computational scene analysis problem.
5 Visual Scene Analysis
- For computational scene analysis, some measure of similarity between features is necessary.
- Elements with similar attributes, such as color, depth, or texture, tend to group.
- As a result, such segmentation gives rise to the notion of a segmentation capacity [69] - at least for networks of relaxation oscillators with a non-instantaneous active phase - that refers to a limited number of oscillator assemblies that may be formed.
- Cesmeli and Wang [8] applied LEGION to motion-based segmentation that considers motion as well as intensity for analyzing image sequences (see also [75]).
- A frame of a motion sequence is shown in Fig. 6A, where a motorcycle rider jumps to a dry canal with his motorcycle while the camera is tracking him.
6 Auditory Scene Analysis
- Frequency components that have common temporal modulation tend to be grouped together.
- Their model relies on global connectivity to achieve synchronization among the oscillators that are stimulated at the same time.
- The second layer groups the segments that emerge from the first layer.
- Their model first performs peripheral processing and then auditory segmentation.
Hz to 5 kHz is employed in peripheral processing. B. A snapshot of the grouping layer. Here, white pixels denote active oscillators that represent the segregated
- C. Another snapshot showing the segregated background.
- At a conceptual level, a major difference between this model and Wang’s model [63] concerns whether attention can be directed to more than one stream:.
- In the Wrigley and Brown model only one stream may be attended to at a time whereas in Wang’s model attention may be divided by more than one stream.
- This issue will be revisited in Sect. 7.1.
7.1 Attention
- The importance of attention for scene analysis can hardly be overstated.
- The difficulty is illustrated by the finding of Field et al. [18] that a path of curvilinearly aligned (snake-like) orientation elements embedded in a background of randomly oriented elements can be readily detected by observers, whereas other paths cannot.
- 70, 76], capacity limitation is a fundamental property of attention.
- Attention can be either goal-driven or stimulus-driven [73].
- Visual feature dimensions include luminance, color, orientation, motion, and depth.
7.2 Feature-based Analysis versus Model-based Analysis
- Scene analysis can be performed on the basis of the features of the objects in the input scene or the models of the objects in the memory.
- What’s at issue is how much model-based analysis contributes to scene analysis, or whether binding should be part of a recognition process.
- The forward path performs pattern recognition that is robust to a range of variations in position and size, and the last layer stores learned patterns.
- A later model along a similar line was proposed by Riesenhuber and Poggio [44], and it uses a hierarchical architecture similar to the neocognitron.
- This point is illustrated in Figure 9 which shows two frogs in a pond.
7.3 Learning versus Representation
- Learning - both supervised and unsupervised - is central to neural networks (and computational intelligence in general).
- The failure of perceptrons to solve this problem is rooted in the lack of a proper representation, not the lack of a powerful learning method.
- The emphasis on representations contrasts that on learning.
- The cepstral representation3 separates voice excitation from vocal tract filtering [22], and the discovery of this representation pays a huge dividend to speech processing tasks including automatic speech recognition where cepstral features are an indispensable part of any state-of-the-art system.
- The above discussion makes it plain that the investigation of computational scene analysis can be characterized in large part as the pursuit of appropriate representations.
8 Concluding Remarks
- In this chapter I have made an effort to define the goal of computational scene analysis explicitly.
- Advances in understanding oscillatory dynamics lead to the development of the oscillatory correlation approach to computational scene analysis with promising results.
- Natural intelligence ranges from sensation, perceptual organization, language, motor control, to decision making and long-term planning.
- Temporal structure is shared by neuronal responses in all parts of the brain, and the time dimension is flexible and infinitely extensible.
- The bewildering complexity of perception makes it necessary to adopt a compass to guide the way forward and avoid many pitfalls along the way.
Did you find this useful? Give us your feedback
Citations
30 citations
24 citations
11 citations
Cites methods from "Computational Scene Analysis"
...Each parameter is described here; note that it is possible to compute these parameters automatically using computational scene analysis (see Wang (2007) for an overview)....
[...]
8 citations
7 citations
References
16,983 citations
11,452 citations
"Computational Scene Analysis" refers methods in this paper
...According to the popular feature integration theory of Treisman and Gelade [57], the visual system first analyzes a scene in parallel by separate retinotopic feature maps...
[...]
...According to the popular feature integration theory of Treisman and Gelade [57], the visual system first analyzes a scene in parallel by separate retinotopic feature maps and focal attention then integrates the analyses within different feature maps to produce a coherent percept....
[...]
8,434 citations
Additional excerpts
...Rosenblatt’s perceptrons [46, 47] are classification networks....
[...]
8,134 citations
Related Papers (5)
Frequently Asked Questions (8)
Q2. What is the promising roadmap for understanding scene analysis?
In my view, the Marrian framework for computational perception provides the most promising roadmap for understanding scene analysis.
Q3. What is the common use of the oscillatory correlation approach?
A large number of studies have applied the oscillatory correlation approach to visual scene analysis tasks, including segmentation of range and texture images, extraction of object contours, and selection of salient objects.
Q4. What is the failure of perceptrons to solve this problem?
The failure of perceptrons to solve this problem is rooted in the lack of a proper representation, not the lack of a powerful learning method.
Q5. What is the cause of the connectedness problem?
The cause, as discussed in Sect. 4, is computational complexity - learning the connectedness predicate would require far too many training samples and too much learning time.
Q6. What are the major grouping principles for auditory scene analysis?
Displaying the acoustic input in a 2-D time-frequency (T-F) representation such as a spectrogram, major grouping principles for auditory scene analysis (ASA) are given below [6, 13]:• Proximity in frequency and time.
Q7. What is the significance of the speed of human scene analysis?
For those who are concerned with biological plausibility, the speed of human scene analysis has strong implications on the kind of processing employed.
Q8. What is the pattern of connectivity within the grouping layer?
This pattern of connectivity within the grouping layer promotes synchronization among a group of segments that have common periodicity.