What have the authors contributed in "On the need for imagistic modeling in story understanding" ?

Q: What have the authors contributed in "On the need for imagistic modeling in story understanding" ?

The authors will illustrate how the need for imagistic modeling arises even in the simplest first-reader stories for children, and provide an initial feasibility study to indicate what the architecture of a system combining symbolic with imagistic understanding might look like.

(Open Access) On the Need for Imagistic Modeling in Story Understanding (2015) | Eric Bigelow

Biologically Inspired Cognitive Architectures 11, Jan. 2015, 22–28 (pre-publication version)

On the Need for Imagistic Modeling in Story

Understanding

Eric Bigelow

, Daniel Scarafoni

, Lenhart Schubert

3∗

, and Alex Wilson

University of Rochester

ebigelow@u.rochester.edu

University of Rochester

dscarafo@u.rochester.edu

University of Rochester

schubert@cs.rochester.edu

University of Rochester

alexwilson@rochester.edu

Abstract

There is ample evidence that human understanding of ordinary language relies in part on a

rich capacity for imagistic mental modeling. We argue that genuine language understanding

in machines will similarly require an imagistic modeling capacity enabling fast construction of

instances of prototypical physical situations and events, whose participants are drawn from a

wide variety of entity types, including animate agents. By allowing fast evaluation of predicates

such as ‘can-see’, ‘under’, and ‘inside’, these model instances support coherent text interpre-

tation. Imagistic modeling is thus a crucial – and not very broadly appreciated – aspect of

the long-standing knowledge acquisition bottleneck in AI. We will illustrate how the need for

imagistic modeling arises even in the simplest ﬁrst-reader stories for children, and provide an

initial feasibility study to indicate what the architecture of a system combining symbolic with

imagistic understanding might look like.

Keywords: imagistic modeling, natural language understanding, knowledge acquisition bottleneck, NLU

architecture

1 Introduction

“Linguistic terms may not in the ﬁrst place describe or represent meanings as such, but

rather serve as triggers for activating concepts of human experience, which are far richer

and more ﬂexible than any lexical entry or formalization could possibly represent.”

– Thora Tenbrink

∗

Corresponding author. The work was supported in part by ONR Award N00014-11-1-0417 and ONR STTR

subcontract N00014-11-10474.

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson

According to a long-standing line of research in cognitive science and neuroscience (with roots

going back to Wilhelm Wundt and William James, or even Plato), human language understand-

ing relies in part on the ad hoc creation of three-dimensional mental models, and mental images

that correspond to visual projections of those models. For example, Johnson-Laird (1983) cites

empirical evidence that human discourse understanding involves both symbolic representations

and structural analogues of the world, where the latter become particularly prominent when

the discourse provides a relatively determinate description of a conﬁguration of objects. At

the same time, even relatively determinate descriptions leave many details open, so that “a

mental model is in essence a representative sample from the set of possible models satisfying the

description” (ibid., p. 165). As Johnson-Laird argues at length, the importance of such models

lies in the (nondeductive) inferences they enable. While some cognitive scientists have argued

against dual representions in favor of purely propositional ones (e.g., Anderson & Bower 1973),

Kosslyn (1994) reviews the extensive evidence showing that “parts of the brain used in visual

perception are also used in visual mental imagery”, and that visual cortex damage impairs not

only vision but also visualization. He proceeds to propose, and marshall evidence (including

PET scans) for, a general theory of how visual imagery is produced with the aid of both the

visual cortex and the motor system, and the important functions it serves.

In AI, genuine language understanding is still thwarted by the knowledge acquisition (KA)

bottleneck. Before we can overcome that formidable obstacle, whether by machine learning or

other methods, we need at least to identify the kinds of knowledge representations required

for understanding. While a great deal of AI research has addressed the question of what sorts

of symbolic representations could support text comprehension, much less attention has been

devoted to the potential role of mental imagery in that process, despite the insights from cog-

nitive science noted above. This may be due in part to the frequently voiced idea that using

internal three-dimensional models for comprehension would require an “inner eye” or Rylean

humunculus. But this objection lacks cogency from an AI perspective, where the computa-

tional advantages of reasoning about the physical world with the aid of geometrical models and

algorithms are well-established. (One of the earliest examples may be Scott Fahlman’s work on

reasoning about block stacking; see (Fahlman 1973).)

In the following, we illustrate how the need for imagistic representations arises in even the

simplest ﬁrst-reader stories. We then outline an architecture for integrated symbolic and imagis-

tic modeling and inference, applying this to the previously introduced motivating examples. As

evidence for the feasibility of such an architecture, we illustrate the functioning of two essential

components on which we have been working: a broad-coverage semantic parser that produces

normalized, nonindexical logical forms from English, and a preliminary imagistic modeling sys-

tem (IMS) that allows construction of simple spatial scenes and evaluation of spatial predicates

such as ‘can-see’, ‘under’, and ‘inside’. Much additional work will be required to bring this

approach to fruition, but we contend that components of the type we have been constructing

will be crucial to success in automating genuine language understanding.

Finally, we discuss related work on building imagistic models of text, and then reiterate

our conclusions. The reported work does not in itself alleviate the KA bottleneck. On the

contrary, we are arguing that the challenge is even larger than one might infer from most

work on KA, which tends to treat the term as synonymous with acquisition of relational or

rule-like knowledge. But in underscoring the need for extensive imagistic knowledge in story

understanding, and outlining the possible form and use of such knowledge, we hope to be

providing a better understanding of the challenges facing the natural language understanding

community.

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson

2 The Need for Imagistic Modeling

“The mechanisms that enable humans to tell, understand, and recombine stories separate

human intelligence from that of other primates.” – Patrick Winston (2011)

The following is a simple story for beginning readers (from Lesson XXXI, Harris et al. 1889):

1. Oh, Rosy!

Do you see that nest in the apple tree? 10. No, no, Frank! Do not get the nest.

2. Yes, yes, Frank; I do see it. 11. Do not get it, I beg you.

3. Has the nest eggs in it, Frank? 12. Please let me get into the tree, too.

4. I think it has, Rosy. 13. Well, Rosy, here is my hand.

5. I will get into the tree. 14. Now! Up, up you go, into the tree.

6. Then I can peep into the nest. 15. Peep into the nest and see the eggs.

7. Here I am, in the tree. 16. Oh, Frank! I see them!

8. Now I can see the eggs in the nest. 17. The pretty, pretty little eggs!

9. Shall I get the nest for you, Rosy? 18. Now, Frank, let us go.

One point where the need for imagistic modeling arises particularly clearly is at sentences

3 and 4. We know from the preceding two sentences that both Rosy and Frank see the nest.

Yet it is clear from sentences 3 and 4 that they cannot see whether there are eggs in the nest

– a fact needed to make sense of their subsequent dialogue and actions. In a purely symbolic

approach, we might try to account for the visual occlusion of the eggs by means of an axiom

stating that to see the contents of a topless container, one must be near it, with one’s head

above it. But there are problems with such an approach: First, the suggested axiom covers only

a minor fraction of the unlimited number of ways in which occlusion can occur. To appreciate

this point, consider the following (constructed) story fragments, where comprehension depends

on a visual occlusion relation:

19. Rosy could not see the nest because of the thick foliage of the apple tree.

20. Jim hid from Frank behind a haystack.

21. With the curtains drawn, Ted did not notice the storm clouds gathering in the sky.

22. He ﬁnally found the “missing” glasses right under the newspaper in front of him.

23. Hoping to see the Milky Way, she stepped out of the house.

24. As he approached the mouse, it disappeared under the bed.

We noted that in order to see into a nest, the viewer should not only have a suﬃciently

high vantage point, but also be near the nest. This brings us to another problem with a

purely symbolic approach: “Near” and “next to” relations are crucial to our understanding of

many ordinary actions and situations, yet qualitative symbolic reasoning cannot in general tell

us which of these relations hold in a speciﬁed situation. Again some simple examples suﬃce

to indicate why understanding proximity relations is important in story understanding; the

unnatural (b)-examples serve to draw attention to the proximity relations involved in the more

natural (a)-examples:

25 a. Without sitting up in his bed, Tim closed and put aside the book he had been reading.

b. #Without sitting up in his bed, Tim closed the door he had left open.

26 a. Sam heard the whir of a mountain bike behind him as he walked down the trail.

He quickly moved aside, grumbling.

b. Sam heard the whir of a helicopter behind him as he walked down the trail.

#He quickly moved aside, grumbling.

27 a. Amy was walking her puppy. When it began to yelp, she quickly took it in her arms.

b. Amy was ﬂying her kite. #When it began to careen, she quickly took it in her arms.

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson

28 a. Walking alongside the lioness, Joy Adamson stroked its head.

b. #Walking alongside the giraﬀe, Joy Adamson stroked its head.

Some of the examples also show that relative sizes and positions matter; clearly such examples

can be multiplied indeﬁntely.

It is worth noting that proximity problems also arise in cognitive robotics, where a robot

may need to anticipate whether it will be next to an object (e.g., a person, food tray or book)

it plans to interact with, after one or more moves or other actions. The “next-to” problem

was analyzed in (Schubert 1990, 1994), and recent robotics research recognizes that cognitive

robots need to construct three-dimensional spatial models of their environment (e.g., Roy et al.

2004).

As ﬁnal examples, we quote two more brief stories from a similar source as our ﬁrst story,

in which the need for modeling spatial relations is quite apparent:

A little girl went in search of ﬂowers for her mother. It was early in the day, and the grass

was wet. Sweet little birds were singing all around her. And what do you think she found

besides ﬂowers? A nest with young birds in it. While she was looking at them, she heard

the mother bird chirp, as if she said, ”Do not touch my children, little girl, for I love them

dearly.” (McGuﬀey 2005, Lesson XLII)

This is a ﬁne day. The sun shines bright. There is a good wind, and my kite ﬂies high.

I can just see it. The sun shines in my eyes; I will stand in the shade of this high fence.

Why, here comes my dog! He was under the cart. Did you see him there? What a good

time we have had! (McGuﬀey 2005, Lesson XXIX)

In the ﬁrst story, note the contrast with our opening story: We infer that the girl’s attention

is generally turned downward, perhaps by reference to the prototypical gaze direction of someone

seeking objects on the ground. So the nest she spots is likely to be situated close to or on the

ground, within a ﬁeld of view lower than the girl’s head. This is conﬁrmed by her ability to look

at the young birds, as well as by the mother bird’s perception that the girl might touch them.

In the second story, the prototypical conﬁguration of a person holding a high-ﬂying kite at the

end of a long string, and the position of the sun high in the sky, are essential to understanding

why the sun shines in the kite-ﬂyer’s eyes. Further, how the shade of a high fence might shield

the kite-ﬂyer from the sun, and why a dog under a cart may or may not be noticeable, are best

understood from (at least rough) physical models of these entities.

3 Combining Linguistic and Imagistic Processing

3.1 Hybrid architecture

Examples like those above indicate the need for an imagistic modeling system (IMS) for mod-

eling spatial relations, supporting a general symbolic understanding and reasoning system but

using techniques that exploit the very special nature of spatial relationships and interactions

of complex objects. We have in mind the kind of hybridization strategy that has been suc-

cessfully pursued in logic-based systems and programming languages that allow for support by

taxonomic, temporal, or arithmetic constraint solvers (e.g., Frisch 1990), and more broadly in

specialist-supported reasoners such as that described in (Schubert et al. 1987), or several of

those in (Melis 1993).

Clearly the IMS will have to include spatial prototypes for a very large number of ordinary

natural objects and artifacts, their typical poses, and their typical positioning in relation to other

objects (e.g., fruits on fruit trees, trees standing on the ground, people standing, sitting, lying

down, etc., birds ﬂying, nesting, or on branch, tables and chairs in a room, etc.) Further, we need

On the Need for Imagistic Modeling Bigelow, Scarafoni, Schubert and Wilson

means to assemble scenes ad hoc from such prototypes in accordance with verbal descriptions

and symbolic knowledge, and crucially, means of “reading oﬀ” properties and relations from

the assembled scenes. We consider the construction of such an IMS an important–and not very

broadly appreciated–aspect of the knowledge acquisition bottleneck in AI.

Building a broad-coverage IMS will be a major challenge to the AI community, but we argue

in the rest of the paper that it would indeed enable understanding of at least the simple kinds of

stories we have used as examples. We do so by describing in some detail how semantic parsing,

inference from background knowledge, and use of an IMS would interact in the processing of

part of our opening story. The description is at least partially concrete, since we are well

along in the construction a general semantic interpreter, have an inference engine that can

reason with interpreted input and background knowledge, and have built a simple preliminary

IMS. However, we do not yet have an integrated end-to-end story understanding system, or a

signiﬁcantly large set of prototypes.

3.2 Illustrating the process

We expect text understanding to proceed sentence-by-sentence, where the ﬁrst step in sentence

processing is semantic parsing. Our semantic parser can handle several of the simple stories we

have looked at after some small adjustments. For example, We change Rosy’s name to Rosie to

prevent the Charniak parser from treating it as an adjective, and we change the question “Has

the nest eggs in it, Frank?” to the American form “Does the nest have eggs in it, Frank?”. After

some automatic postprocessing of the parse tree, for example to mark prepositional phrases with

their type and to insert traces for dislocated constituents, our semantic interpreter successively

produces (i) an initial logical form (LF) by compositional rules; (ii) an indexical LF in which

quantiﬁers and connectives are fully scoped and intrasentential anaphors are resolved; (iii)

a deindexed LF with a speech act predicate and with explicit, temporally modiﬁed episodic

(event or situation) variables; and (iv) a set of canonicalized formulas derived from the previous

stage by Skolemization, negation scope narrowing, equality substitutions, separation of top-level

conjuncts, and other operations. The following are the parse trees and the set of Episodic Logic

formulas (Schubert & Hwang 2000) derived automatically from sentence (1) of our lead story:

PARSE TREES:

‘‘‘‘‘‘‘‘‘‘‘

(FRAG (INTJ (UH OH)) (, ,) (NP (NNP ROSIE)) (. !)),

(SQ (AUX DO) (NP (PRP YOU))

(VP (VB SEE) (NP (DT THAT) (NN NEST))

(PP-IN (IN IN) (NP (DT THE)

(NN APPLE) (NN TREE)))) (. ?))

CANONICAL FORMULAS, WITH HEARER IDENTIFIED AS ROSIE:

‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘

(SPEAKER DIRECT-OH-TOWARDS*.V ROSIE.NAME),

(NEST8.SK NEST.N), (NEST8.SK NEW-SALIENT-ENTITY*.N),

(TREE9.SK ((NN APPLE.N) TREE.N)),

((SPEAKER ASK.V ROSIE.NAME

(QNOM (YNQ (ROSIE.NAME SEE.V NEST8.SK)))) ** E7.SK),

((SPEAKER ASK.V ROSIE.NAME

(QNOM (YNQ (ROSIE.NAME SEE.V

(THAT (NEST8.SK IN.P TREE9.SK)))))) * E7.SK)

On the Need for Imagistic Modeling in Story Understanding

Figures

Citations

Doubletalk – The biological and social acquisition of language

Computational Models for Spatial Prepositions

Visual mental imagery: A view from artificial intelligence.

A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World.

The first [- ] reader

References

Mental Models : Towards a Cognitive Science of Language

Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness

Human Associative Memory

Image and Brain: The Resolution of the Imagery Debate

Mental models: towards a cognitive science of language, inference, and consciousness

Related Papers (5)

Imagistic Simulation in Scientific Model Construction

The Knowledge Level Reinterpreted: Modeling How Systems Interact

The Computational Study of Language Acquisition

Modelling knowledge for a natural language: understanding system

Accessing Knowledge through Natural Language

Frequently Asked Questions (1)

Q1. What have the authors contributed in "On the need for imagistic modeling in story understanding" ?