scispace - formally typeset
Open AccessJournal ArticleDOI

On the Need for Imagistic Modeling in Story Understanding

Reads0
Chats0
TLDR
It is argued that genuine language understanding in machines will similarly require an imagistic modeling capacity enabling fast construction of instances of prototypical physical situations and events, whose participants are drawn from a wide variety of entity types, including animate agents.
Abstract
There is ample evidence that human understanding of ordinary language relies in part on a rich capacity for imagistic mental modeling. We argue that genuine language understanding in machines will similarly require an imagistic modeling capacity enabling fast construction of instances of prototypical physical situations and events, whose participants are drawn from a wide variety of entity types, including animate agents. By allowing fast evaluation of predicates such as ‘can-see’, ‘under’, and ‘inside’, these model instances support coherent text interpretation. Imagistic modeling is thus a crucial – and not very broadly appreciated – aspect of the long-standing knowledge acquisition bottleneck in AI. We will illustrate how the need for imagistic modeling arises even in the simplest first-reader stories for children, and provide an initial feasibility study to indicate what the architecture of a system combining symbolic with imagistic understanding might look like.

read more

Content maybe subject to copyright    Report

Biologically Inspired Cognitive Architectures 11, Jan. 2015, 22–28 (pre-publication version)
On the Need for Imagistic Modeling in Story
Understanding
Eric Bigelow
1
, Daniel Scarafoni
2
, Lenhart Schubert
3
, and Alex Wilson
4
1
University of Rochester
ebigelow@u.rochester.edu
2
University of Rochester
dscarafo@u.rochester.edu
3
University of Rochester
schubert@cs.rochester.edu
4
University of Rochester
alexwilson@rochester.edu
Abstract
There is ample evidence that human understanding of ordinary language relies in part on a
rich capacity for imagistic mental modeling. We argue that genuine language understanding
in machines will similarly require an imagistic modeling capacity enabling fast construction of
instances of prototypical physical situations and events, whose participants are drawn from a
wide variety of entity types, including animate agents. By allowing fast evaluation of predicates
such as ‘can-see’, ‘under’, and ‘inside’, these model instances support coherent text interpre-
tation. Imagistic modeling is thus a crucial and not very broadly appreciated aspect of
the long-standing knowledge acquisition bottleneck in AI. We will illustrate how the need for
imagistic modeling arises even in the simplest first-reader stories for children, and provide an
initial feasibility study to indicate what the architecture of a system combining symbolic with
imagistic understanding might look like.
Keywords: imagistic modeling, natural language understanding, knowledge acquisition bottleneck, NLU
architecture
1 Introduction
“Linguistic terms may not in the first place describe or represent meanings as such, but
rather serve as triggers for activating concepts of human experience, which are far richer
and more flexible than any lexical entry or formalization could possibly represent.”
Thora Tenbrink
Corresponding author. The work was supported in part by ONR Award N00014-11-1-0417 and ONR STTR
subcontract N00014-11-10474.
1

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson
According to a long-standing line of research in cognitive science and neuroscience (with roots
going back to Wilhelm Wundt and William James, or even Plato), human language understand-
ing relies in part on the ad hoc creation of three-dimensional mental models, and mental images
that correspond to visual projections of those models. For example, Johnson-Laird (1983) cites
empirical evidence that human discourse understanding involves both symbolic representations
and structural analogues of the world, where the latter become particularly prominent when
the discourse provides a relatively determinate description of a configuration of objects. At
the same time, even relatively determinate descriptions leave many details open, so that “a
mental model is in essence a representative sample from the set of possible models satisfying the
description” (ibid., p. 165). As Johnson-Laird argues at length, the importance of such models
lies in the (nondeductive) inferences they enable. While some cognitive scientists have argued
against dual representions in favor of purely propositional ones (e.g., Anderson & Bower 1973),
Kosslyn (1994) reviews the extensive evidence showing that “parts of the brain used in visual
perception are also used in visual mental imagery”, and that visual cortex damage impairs not
only vision but also visualization. He proceeds to propose, and marshall evidence (including
PET scans) for, a general theory of how visual imagery is produced with the aid of both the
visual cortex and the motor system, and the important functions it serves.
In AI, genuine language understanding is still thwarted by the knowledge acquisition (KA)
bottleneck. Before we can overcome that formidable obstacle, whether by machine learning or
other methods, we need at least to identify the kinds of knowledge representations required
for understanding. While a great deal of AI research has addressed the question of what sorts
of symbolic representations could support text comprehension, much less attention has been
devoted to the potential role of mental imagery in that process, despite the insights from cog-
nitive science noted above. This may be due in part to the frequently voiced idea that using
internal three-dimensional models for comprehension would require an “inner eye” or Rylean
humunculus. But this objection lacks cogency from an AI perspective, where the computa-
tional advantages of reasoning about the physical world with the aid of geometrical models and
algorithms are well-established. (One of the earliest examples may be Scott Fahlman’s work on
reasoning about block stacking; see (Fahlman 1973).)
In the following, we illustrate how the need for imagistic representations arises in even the
simplest first-reader stories. We then outline an architecture for integrated symbolic and imagis-
tic modeling and inference, applying this to the previously introduced motivating examples. As
evidence for the feasibility of such an architecture, we illustrate the functioning of two essential
components on which we have been working: a broad-coverage semantic parser that produces
normalized, nonindexical logical forms from English, and a preliminary imagistic modeling sys-
tem (IMS) that allows construction of simple spatial scenes and evaluation of spatial predicates
such as ‘can-see’, ‘under’, and ‘inside’. Much additional work will be required to bring this
approach to fruition, but we contend that components of the type we have been constructing
will be crucial to success in automating genuine language understanding.
Finally, we discuss related work on building imagistic models of text, and then reiterate
our conclusions. The reported work does not in itself alleviate the KA bottleneck. On the
contrary, we are arguing that the challenge is even larger than one might infer from most
work on KA, which tends to treat the term as synonymous with acquisition of relational or
rule-like knowledge. But in underscoring the need for extensive imagistic knowledge in story
understanding, and outlining the possible form and use of such knowledge, we hope to be
providing a better understanding of the challenges facing the natural language understanding
community.
2

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson
2 The Need for Imagistic Modeling
“The mechanisms that enable humans to tell, understand, and recombine stories separate
human intelligence from that of other primates.” Patrick Winston (2011)
The following is a simple story for beginning readers (from Lesson XXXI, Harris et al. 1889):
1. Oh, Rosy!
Do you see that nest in the apple tree? 10. No, no, Frank! Do not get the nest.
2. Yes, yes, Frank; I do see it. 11. Do not get it, I beg you.
3. Has the nest eggs in it, Frank? 12. Please let me get into the tree, too.
4. I think it has, Rosy. 13. Well, Rosy, here is my hand.
5. I will get into the tree. 14. Now! Up, up you go, into the tree.
6. Then I can peep into the nest. 15. Peep into the nest and see the eggs.
7. Here I am, in the tree. 16. Oh, Frank! I see them!
8. Now I can see the eggs in the nest. 17. The pretty, pretty little eggs!
9. Shall I get the nest for you, Rosy? 18. Now, Frank, let us go.
One point where the need for imagistic modeling arises particularly clearly is at sentences
3 and 4. We know from the preceding two sentences that both Rosy and Frank see the nest.
Yet it is clear from sentences 3 and 4 that they cannot see whether there are eggs in the nest
a fact needed to make sense of their subsequent dialogue and actions. In a purely symbolic
approach, we might try to account for the visual occlusion of the eggs by means of an axiom
stating that to see the contents of a topless container, one must be near it, with one’s head
above it. But there are problems with such an approach: First, the suggested axiom covers only
a minor fraction of the unlimited number of ways in which occlusion can occur. To appreciate
this point, consider the following (constructed) story fragments, where comprehension depends
on a visual occlusion relation:
19. Rosy could not see the nest because of the thick foliage of the apple tree.
20. Jim hid from Frank behind a haystack.
21. With the curtains drawn, Ted did not notice the storm clouds gathering in the sky.
22. He finally found the “missing” glasses right under the newspaper in front of him.
23. Hoping to see the Milky Way, she stepped out of the house.
24. As he approached the mouse, it disappeared under the bed.
We noted that in order to see into a nest, the viewer should not only have a sufficiently
high vantage point, but also be near the nest. This brings us to another problem with a
purely symbolic approach: “Near” and “next to” relations are crucial to our understanding of
many ordinary actions and situations, yet qualitative symbolic reasoning cannot in general tell
us which of these relations hold in a specified situation. Again some simple examples suffice
to indicate why understanding proximity relations is important in story understanding; the
unnatural (b)-examples serve to draw attention to the proximity relations involved in the more
natural (a)-examples:
25 a. Without sitting up in his bed, Tim closed and put aside the book he had been reading.
b. #Without sitting up in his bed, Tim closed the door he had left open.
26 a. Sam heard the whir of a mountain bike behind him as he walked down the trail.
He quickly moved aside, grumbling.
b. Sam heard the whir of a helicopter behind him as he walked down the trail.
#He quickly moved aside, grumbling.
27 a. Amy was walking her puppy. When it began to yelp, she quickly took it in her arms.
b. Amy was flying her kite. #When it began to careen, she quickly took it in her arms.
3

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson
28 a. Walking alongside the lioness, Joy Adamson stroked its head.
b. #Walking alongside the giraffe, Joy Adamson stroked its head.
Some of the examples also show that relative sizes and positions matter; clearly such examples
can be multiplied indefintely.
It is worth noting that proximity problems also arise in cognitive robotics, where a robot
may need to anticipate whether it will be next to an object (e.g., a person, food tray or book)
it plans to interact with, after one or more moves or other actions. The “next-to” problem
was analyzed in (Schubert 1990, 1994), and recent robotics research recognizes that cognitive
robots need to construct three-dimensional spatial models of their environment (e.g., Roy et al.
2004).
As final examples, we quote two more brief stories from a similar source as our first story,
in which the need for modeling spatial relations is quite apparent:
A little girl went in search of flowers for her mother. It was early in the day, and the grass
was wet. Sweet little birds were singing all around her. And what do you think she found
besides flowers? A nest with young birds in it. While she was looking at them, she heard
the mother bird chirp, as if she said, ”Do not touch my children, little girl, for I love them
dearly.” (McGuffey 2005, Lesson XLII)
This is a fine day. The sun shines bright. There is a good wind, and my kite flies high.
I can just see it. The sun shines in my eyes; I will stand in the shade of this high fence.
Why, here comes my dog! He was under the cart. Did you see him there? What a good
time we have had! (McGuffey 2005, Lesson XXIX)
In the first story, note the contrast with our opening story: We infer that the girl’s attention
is generally turned downward, perhaps by reference to the prototypical gaze direction of someone
seeking objects on the ground. So the nest she spots is likely to be situated close to or on the
ground, within a field of view lower than the girl’s head. This is confirmed by her ability to look
at the young birds, as well as by the mother bird’s perception that the girl might touch them.
In the second story, the prototypical configuration of a person holding a high-flying kite at the
end of a long string, and the position of the sun high in the sky, are essential to understanding
why the sun shines in the kite-flyer’s eyes. Further, how the shade of a high fence might shield
the kite-flyer from the sun, and why a dog under a cart may or may not be noticeable, are best
understood from (at least rough) physical models of these entities.
3 Combining Linguistic and Imagistic Processing
3.1 Hybrid architecture
Examples like those above indicate the need for an imagistic modeling system (IMS) for mod-
eling spatial relations, supporting a general symbolic understanding and reasoning system but
using techniques that exploit the very special nature of spatial relationships and interactions
of complex objects. We have in mind the kind of hybridization strategy that has been suc-
cessfully pursued in logic-based systems and programming languages that allow for support by
taxonomic, temporal, or arithmetic constraint solvers (e.g., Frisch 1990), and more broadly in
specialist-supported reasoners such as that described in (Schubert et al. 1987), or several of
those in (Melis 1993).
Clearly the IMS will have to include spatial prototypes for a very large number of ordinary
natural objects and artifacts, their typical poses, and their typical positioning in relation to other
objects (e.g., fruits on fruit trees, trees standing on the ground, people standing, sitting, lying
down, etc., birds flying, nesting, or on branch, tables and chairs in a room, etc.) Further, we need
4

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson
means to assemble scenes ad hoc from such prototypes in accordance with verbal descriptions
and symbolic knowledge, and crucially, means of “reading off” properties and relations from
the assembled scenes. We consider the construction of such an IMS an important–and not very
broadly appreciated–aspect of the knowledge acquisition bottleneck in AI.
Building a broad-coverage IMS will be a major challenge to the AI community, but we argue
in the rest of the paper that it would indeed enable understanding of at least the simple kinds of
stories we have used as examples. We do so by describing in some detail how semantic parsing,
inference from background knowledge, and use of an IMS would interact in the processing of
part of our opening story. The description is at least partially concrete, since we are well
along in the construction a general semantic interpreter, have an inference engine that can
reason with interpreted input and background knowledge, and have built a simple preliminary
IMS. However, we do not yet have an integrated end-to-end story understanding system, or a
significantly large set of prototypes.
3.2 Illustrating the process
We expect text understanding to proceed sentence-by-sentence, where the first step in sentence
processing is semantic parsing. Our semantic parser can handle several of the simple stories we
have looked at after some small adjustments. For example, We change Rosy’s name to Rosie to
prevent the Charniak parser from treating it as an adjective, and we change the question “Has
the nest eggs in it, Frank?” to the American form “Does the nest have eggs in it, Frank?”. After
some automatic postprocessing of the parse tree, for example to mark prepositional phrases with
their type and to insert traces for dislocated constituents, our semantic interpreter successively
produces (i) an initial logical form (LF) by compositional rules; (ii) an indexical LF in which
quantifiers and connectives are fully scoped and intrasentential anaphors are resolved; (iii)
a deindexed LF with a speech act predicate and with explicit, temporally modified episodic
(event or situation) variables; and (iv) a set of canonicalized formulas derived from the previous
stage by Skolemization, negation scope narrowing, equality substitutions, separation of top-level
conjuncts, and other operations. The following are the parse trees and the set of Episodic Logic
formulas (Schubert & Hwang 2000) derived automatically from sentence (1) of our lead story:
PARSE TREES:
‘‘‘‘‘‘‘‘‘‘‘
(FRAG (INTJ (UH OH)) (, ,) (NP (NNP ROSIE)) (. !)),
(SQ (AUX DO) (NP (PRP YOU))
(VP (VB SEE) (NP (DT THAT) (NN NEST))
(PP-IN (IN IN) (NP (DT THE)
(NN APPLE) (NN TREE)))) (. ?))
CANONICAL FORMULAS, WITH HEARER IDENTIFIED AS ROSIE:
‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘
(SPEAKER DIRECT-OH-TOWARDS*.V ROSIE.NAME),
(NEST8.SK NEST.N), (NEST8.SK NEW-SALIENT-ENTITY*.N),
(TREE9.SK ((NN APPLE.N) TREE.N)),
((SPEAKER ASK.V ROSIE.NAME
(QNOM (YNQ (ROSIE.NAME SEE.V NEST8.SK)))) ** E7.SK),
((SPEAKER ASK.V ROSIE.NAME
(QNOM (YNQ (ROSIE.NAME SEE.V
(THAT (NEST8.SK IN.P TREE9.SK)))))) * E7.SK)
5

Citations
More filters
Journal ArticleDOI

Doubletalk – The biological and social acquisition of language

TL;DR: In this paper, a model pertaining to natural language acquisition in humans is proposed to demonstrate the challenges facing biologically inspired cognitive architectures, and the model asserts that language acquisition is a continuous process in which the pre-competent phase grounds the later post-competeent phase, where the character of relevant online perceptions changes from processing of concrete surroundings to the processing of words.
Proceedings ArticleDOI

Computational Models for Spatial Prepositions

TL;DR: This paper treats the modeling task as calling for assignment of probabilities to prepositional relations as a function of multiple factors, where such probabilities can be viewed as estimates of whether humans would judge the relations to hold in given circumstances.
Journal ArticleDOI

Visual mental imagery: A view from artificial intelligence.

TL;DR: Three important open research questions in the study of visual-imagery-based AI systems are discussed-on evaluating system performance, learning imagery operators, and representing abstract concepts-and their implications for understanding human visual mental imagery.
Proceedings Article

A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World.

TL;DR: This work tackles spatial question answering in a holistic way, using a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modeling.
References
More filters
Journal ArticleDOI

Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness

Marilyn Ford, +1 more
- 01 Dec 1985 - 
TL;DR: Johnson-Laird as discussed by the authors argues that we apprehend the world by building inner mental replicas of the relations among objects and events that concern us, and provides both a blueprint for building such a model and numerous important illustrations of how to do it.
Book

Human Associative Memory

TL;DR: In this paper, a theory about human memory, about how a person encodes, retains, and retrieves information from memory, was proposed and tested, based on the HAM theory.
Book

Image and Brain: The Resolution of the Imagery Debate

TL;DR: In this paper, the authors propose a system at its joints to identify objects in different locations identifying objects when different portions are visible identifying objects in degraded images identifying contorted objects identifying objects - normal and damaged brains generating and maintaining visual images inspecting and transforming visual images visual mental images in the brain.
Book

Mental models: towards a cognitive science of language, inference, and consciousness

TL;DR: Johnson-Laird as discussed by the authors argues that we apprehend the world by building inner mental replicas of the relations among objects and events that concern us, and provides both a blueprint for building such a model and numerous important illustrations of how to do it.
Frequently Asked Questions (1)
Q1. What have the authors contributed in "On the need for imagistic modeling in story understanding" ?

The authors will illustrate how the need for imagistic modeling arises even in the simplest first-reader stories for children, and provide an initial feasibility study to indicate what the architecture of a system combining symbolic with imagistic understanding might look like.