scispace - formally typeset
Open AccessJournal ArticleDOI

Modeling Semantic Expectation: Using Script Knowledge for Referent Prediction

Reads0
Chats0
TLDR
This article investigated the factors that affect human prediction by building a computational model that can predict upcoming discourse referents based on linguistic knowledge alone vs. common-sense knowledge in the form of scripts.
Abstract
Recent research in psycholinguistics has provided increasing evidence that humans predict upcoming content. Prediction also affects perception and might be a key to robustness in human language processing. In this paper, we investigate the factors that affect human prediction by building a computational model that can predict upcoming discourse referents based on linguistic knowledge alone vs. linguistic knowledge jointly with common-sense knowledge in the form of scripts. We find that script knowledge significantly improves model estimates of human predictions. In a second study, we test the highly controversial hypothesis that predictability influences referring expression type but do not find evidence for such an effect.

read more

Content maybe subject to copyright    Report

Modeling Semantic Expectation:
Using Script Knowledge for Referent Prediction
Ashutosh Modi
1,3
Ivan Titov
2,4
Vera Demberg
1,3
Asad Sayeed
1,3
Manfred Pinkal
1,3
1
{ashutosh,vera,asayeed,pinkal}@coli.uni-saarland.de
2
titov@uva.nl
3
Universit
¨
at des Saarlandes, Germany
4
ILLC, University of Amsterdam, the Netherlands
Abstract
Recent research in psycholinguistics has pro-
vided increasing evidence that humans predict
upcoming content. Prediction also affects per-
ception and might be a key to robustness in
human language processing. In this paper,
we investigate the factors that affect human
prediction by building a computational model
that can predict upcoming discourse referents
based on linguistic knowledge alone vs. lin-
guistic knowledge jointly with common-sense
knowledge in the form of scripts. We find
that script knowledge significantly improves
model estimates of human predictions. In a
second study, we test the highly controversial
hypothesis that predictability influences refer-
ring expression type but do not find evidence
for such an effect.
1 Introduction
Being able to anticipate upcoming content is a
core property of human language processing (Kutas
et al., 2011; Kuperberg and Jaeger, 2016) that has re-
ceived a lot of attention in the psycholinguistic liter-
ature in recent years. Expectations about upcoming
words help humans comprehend language in noisy
settings and deal with ungrammatical input. In this
paper, we use a computational model to address the
question of how different layers of knowledge (lin-
guistic knowledge as well as common-sense knowl-
edge) influence human anticipation.
Here we focus our attention on semantic pre-
dictions of discourse referents for upcoming noun
phrases. This task is particularly interesting because
it allows us to separate the semantic task of antic-
ipating an intended referent and the processing of
the actual surface form. For example, in the con-
text of I ordered a medium sirloin steak with fries.
Later, the waiter brought . . . , there is a strong ex-
pectation of a specific discourse referent, i.e., the
referent introduced by the object NP of the preced-
ing sentence, while the possible referring expression
could be either the steak I had ordered, the steak,
our food, or it. Existing models of human predic-
tion are usually formulated using the information-
theoretic concept of surprisal. In recent work, how-
ever, surprisal is usually not computed for DRs,
which represent the relevant semantic unit, but for
the surface form of the referring expressions, even
though there is an increasing amount of literature
suggesting that human expectations at different lev-
els of representation have separable effects on pre-
diction and, as a consequence, that the modelling
of only one level (the linguistic surface form) is in-
sufficient (Kuperberg and Jaeger, 2016; Kuperberg,
2016; Zarcone et al., 2016). The present model ad-
dresses this shortcoming by explicitly modelling and
representing common-sense knowledge and concep-
tually separating the semantic (discourse referent)
and the surface level (referring expression) expec-
tations.
Our discourse referent prediction task is related
to the NLP task of coreference resolution, but it
substantially differs from that task in the following
ways: 1) we use only the incrementally available left
context, while coreference resolution uses the full
text; 2) coreference resolution tries to identify the
DR for a given target NP in context, while we look
at the expectations of DRs based only on the context
31
Transactions of the Association for Computational Linguistics, vol. 5, pp. 31–44, 2017. Action Editor: Hwee Tou Ng.
Submission batch: 8/2016 Revision batch: 10/2016; Published 1/2017.
c
2017 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.

before the target NP is seen.
The distinction between referent prediction and
prediction of referring expressions also allows us to
study a closely related question in natural language
generation: the choice of a type of referring expres-
sion based on the predictability of the DR that is
intended by the speaker. This part of our work is
inspired by a referent guessing experiment by Tily
and Piantadosi (2009), who showed that highly pre-
dictable referents were more likely to be realized
with a pronoun than unpredictable referents, which
were more likely to be realized using a full NP. The
effect they observe is consistent with a Gricean point
of view, or the principle of uniform information den-
sity (see Section 5.1). However, Tily and Piantadosi
do not provide a computational model for estimat-
ing referent predictability. Also, they do not include
selectional preference or common-sense knowledge
effects in their analysis.
We believe that script knowledge, i.e., common-
sense knowledge about everyday event sequences,
represents a good starting point for modelling con-
versational anticipation. This type of common-sense
knowledge includes temporal structure which is par-
ticularly relevant for anticipation in continuous lan-
guage processing. Furthermore, our approach can
build on progress that has been made in recent years
in methods for acquiring large-scale script knowl-
edge; see Section 1.1. Our hypothesis is that script
knowledge may be a significant factor in human an-
ticipation of discourse referents. Explicitly mod-
elling this knowledge will thus allow us to produce
more human-like predictions.
Script knowledge enables our model to generate
anticipations about discourse referents that have al-
ready been mentioned in the text, as well as anticipa-
tions about textually new discourse referents which
have been activated due to script knowledge. By
modelling event sequences and event participants,
our model captures many more long-range depen-
dencies than normal language models are able to. As
an example, consider the following two alternative
text passages:
We got seated, and had to wait for 20 minutes.
Then, the waiter brought the ...
We ordered, and had to wait for 20 minutes. Then,
the waiter brought the ...
Preferred candidate referents for the object posi-
tion of the waiter brought the ... are instances of
the food, menu, or bill participant types. In the con-
text of the alternative preceding sentences, there is a
strong expectation of instances of a menu and a food
participant, respectively.
This paper represents foundational research in-
vestigating human language processing. However,
it also has the potential for application in assistant
technology and embodied agents. The goal is to
achieve human-level language comprehension in re-
alistic settings, and in particular to achieve robust-
ness in the face of errors or noise. Explicitly mod-
elling expectations that are driven by common-sense
knowledge is an important step in this direction.
In order to be able to investigate the influence
of script knowledge on discourse referent expecta-
tions, we use a corpus that contains frequent refer-
ence to script knowledge, and provides annotations
for coreference information, script events and par-
ticipants (Section 2). In Section 3, we present a
large-scale experiment for empirically assessing hu-
man expectations on upcoming referents, which al-
lows us to quantify at what points in a text humans
have very clear anticipations vs. when they do not.
Our goal is to model human expectations, even if
they turn out to be incorrect in a specific instance.
The experiment was conducted via Mechanical Turk
and follows the methodology of Tily and Pianta-
dosi (2009). In section 4, we describe our computa-
tional model that represents script knowledge. The
model is trained on the gold standard annotations of
the corpus, because we assume that human compre-
henders usually will have an analysis of the preced-
ing discourse which closely corresponds to the gold
standard. We compare the prediction accuracy of
this model to human predictions, as well as to two
baseline models in Section 4.3. One of them uses
only structural linguistic features for predicting ref-
erents; the other uses general script-independent se-
lectional preference features. In Section 5, we test
whether surprisal (as estimated from human guesses
vs. computational models) can predict the type of
referring expression used in the original texts in the
corpus (pronoun vs. full referring expression). This
experiment also has wider implications with respect
to the on-going discussion of whether the referring
expression choice is dependent on predictability, as
predicted by the uniform information density hy-
32

(I)
(1)
P bather
[decided]
E wash
to take a (bath)
(2)
P bath
yesterday afternoon after working out . Once (I)
(1)
P bather
got
back home , (I)
(1)
P bather
[walked]
E enter bathroom
to (my)
(1)
P bather
(bathroom)
(3)
P bathroom
and first quickly scrubbed
the (bathroom tub)
(4)
P bathtub
by [turning on]
E turn water on
the (water)
(5)
P water
and rinsing (it)
(4)
P bathtub
clean with
a rag . After (I)
(1)
P bather
finished , (I)
(1)
P bather
[plugged]
E close drain
the (tub)
(4)
P bathtub
and began [filling]
E fill water
(it)
(4)
P bathtub
with warm (water)
(5)
P water
set at about 98 (degrees)
(6)
P temperature
.
Figure 1: An excerpt from a story in the InScript corpus. The referring expressions are in parentheses, and the
corresponding discourse referent label is given by the superscript. Referring expressions of the same discourse referent
have the same color and superscript number. Script-relevant events are in square brackets and colored in orange. Event
type is indicated by the corresponding subscript.
pothesis.
The contributions of this paper consist of:
a large dataset of human expectations, in a va-
riety of texts related to every-day activities.
an implementation of the conceptual distinction
between the semantic level of referent predic-
tion and the type of a referring expression.
a computational model which significantly im-
proves modelling of human anticipations.
showing that script knowledge is a significant
factor in human expectations.
testing the hypothesis of Tily and Piantadosi
that the choice of the type of referring expres-
sion (pronoun or full NP) depends on the pre-
dictability of the referent.
1.1 Scripts
Scripts represent knowledge about typical event
sequences (Schank and Abelson, 1977), for exam-
ple the sequence of events happening when eating
at a restaurant. Script knowledge thereby includes
events like order, bring and eat as well as partici-
pants of those events, e.g., menu, waiter, food, guest.
Existing methods for acquiring script knowledge
are based on extracting narrative chains from text
(Chambers and Jurafsky, 2008; Chambers and Juraf-
sky, 2009; Jans et al., 2012; Pichotta and Mooney,
2014; Rudinger et al., 2015; Modi, 2016; Ahrendt
and Demberg, 2016) or by eliciting script knowledge
via Crowdsourcing on Mechanical Turk (Regneri et
al., 2010; Frermann et al., 2014; Modi and Titov,
2014).
Modelling anticipated events and participants is
motivated by evidence showing that event repre-
sentations in humans contain information not only
about the current event, but also about previous
and future states, that is, humans generate anticipa-
tions about event sequences during normal language
comprehension (Sch
¨
utz-Bosbach and Prinz, 2007).
Script knowledge representations have been shown
to be useful in NLP applications for ambiguity reso-
lution during reference resolution (Rahman and Ng,
2012).
2 Data: The InScript Corpus
Ordinary texts, including narratives, encode script
structure in a way that is too complex and too im-
plicit at the same time to enable a systematic study of
script-based expectation. They contain interleaved
references to many different scripts, and they usually
refer to single scripts in a point-wise fashion only,
relying on the ability of the reader to infer the full
event chain using their background knowledge.
We use the InScript corpus (Modi et al., 2016) to
study the predictive effect of script knowledge. In-
Script is a crowdsourced corpus of simple narrative
texts. Participants were asked to write about a spe-
cific activity (e.g., a restaurant visit, a bus ride, or a
grocery shopping event) which they personally ex-
perienced, and they were instructed to tell the story
as if explaining the activity to a child. This resulted
in stories that are centered around a specific scenario
and that explicitly mention mundane details. Thus,
they generally realize longer event chains associated
with a single script, which makes them particularly
appropriate to our purpose.
The InScript corpus is labelled with event-type,
participant-type, and coreference information. Full
verbs are labeled with event type information, heads
of all noun phrases with participant types, using
scenario-specific lists of event types (such as enter
bathroom, close drain and fill water for the “taking a
bath” scenario) and participant types (such as bather,
water and bathtub). On average, each template of-
fers a choice of 20 event types and 18 participant
33

(I)
(1)
decided to take a (bath)
(2)
yesterday afternoon
after working out . Once (I)
(1)
got back home , (I)
(1)
walked to (my)
(1)
(bathroom)
(3)
and first quickly
scrubbed the (bathroom tub)
(4)
by turning on the
(water)
(5)
and rinsing (it)
(4)
clean with a rag . Af-
ter (I)
(1)
finished , (I)
(1)
plugged XXXXXX
Figure 2: An illustration of the Mechanical Turk exper-
iment for the referent cloze task. Workers are supposed
to guess the upcoming referent (indicated by XXXXXX
above). They can either choose from the previously acti-
vated referents, or they can write something new.
0 5 10 15 20
14
5
1
DR_4
(P_bathtub)
the drain
(new DR)
DR_1
(P_bather)
Number of Workers
Figure 3: Response of workers corresponding to the story
in Fig. 2. Workers guessed two already activated dis-
course referents (DR) DR 4 and DR 1. Some of the
workers also chose the “new” option and wrote different
lexical variants of “bathtub drain”, a new DR correspond-
ing to the participant type “the drain”.
types.
The InScript corpus consists of 910 stories ad-
dressing 10 scenarios (about 90 stories per scenario).
The corpus has 200,000 words, 12,000 verb in-
stances with event labels, and 44,000 head nouns
with participant instances. Modi et al. (2016) report
an inter-annotator agreement of 0.64 for event types
and 0.77 for participant types (Fleiss’ kappa).
We use gold-standard event- and participant-type
annotation to study the influence of script knowl-
edge on the expectation of discourse referents. In
addition, InScript provides coreference annotation,
which makes it possible to keep track of the men-
tioned discourse referents at each point in the story.
We use this information in the computational model
of DR prediction and in the DR guessing experiment
described in the next section. An example of an an-
notated InScript story is shown in Figure 1.
3 Referent Cloze Task
We use the InScript corpus to develop computa-
tional models for the prediction of discourse refer-
ents (DRs) and to evaluate their prediction accuracy.
This can be done by testing how often our models
manage to reproduce the original discourse referent
(cf. also the “narrative cloze” task by (Chambers and
Jurafsky, 2008) which tests whether a verb together
with a role can be correctly guessed by a model).
However, we do not only want to predict the “cor-
rect” DRs in a text but also to model human expec-
tation of DRs in context. To empirically assess hu-
man expectation, we created an additional database
of crowdsourced human predictions of discourse ref-
erents in context using Amazon Mechanical Turk.
The design of our experiment closely resembles the
guessing game of (Tily and Piantadosi, 2009) but ex-
tends it in a substantial way.
Workers had to read stories of the InScript corpus
1
and guess upcoming participants: for each target
NP, workers were shown the story up to this NP ex-
cluding the NP itself, and they were asked to guess
the next person or object most likely to be referred
to. In case they decided in favour of a discourse ref-
erent already mentioned, they had to choose among
the available discourse referents by clicking an NP
in the preceding text, i.e., some noun with a specific,
coreference-indicating color; see Figure 2. Other-
wise, they would click the “New” button, and would
in turn be asked to give a short description of the new
person or object they expected to be mentioned. The
percentage of guesses that agree with the actually re-
ferred entity was taken as a basis for estimating the
surprisal.
The experiment was done for all stories of the
test set: 182 stories (20%) of the InScript corpus,
evenly taken from all scenarios. Since our focus is
on the effect of script knowledge, we only consid-
ered those NPs as targets that are direct dependents
of script-related events. Guessing started from the
third sentence only in order to ensure that a mini-
mum of context information was available. To keep
the complexity of the context manageable, we re-
stricted guessing to a maximum of 30 targets and
skipped the rest of the story (this applied to 12%
of the stories). We collected 20 guesses per NP for
3346 noun phrase instances, which amounts to a to-
tal of around 67K guesses. Workers selected a con-
1
The corpus is available at : http://www.sfb1102.
uni-saarland.de/?page_id=2582
34

text NP in 68% of cases and “New” in 32% of cases.
Our leading hypothesis is that script knowledge
substantially influences human expectation of dis-
course referents. The guessing experiment provides
a basis to estimate human expectation of already
mentioned DRs (the number of clicks on the respec-
tive NPs in text). However, we expect that script
knowledge has a particularly strong influence in the
case of first mentions. Once a script is evoked in a
text, we assume that the full script structure, includ-
ing all participants, is activated and available to the
reader.
Tily and Piantadosi (2009) are interested in sec-
ond mentions only and therefore do not make use
of the worker-generated noun phrases classified as
“New”. To study the effect of activated but not
explicitly mentioned participants, we carried out a
subsequent annotation step on the worker-generated
noun phrases classified as “New”. We presented an-
notators with these noun phrases in their contexts
(with co-referring NPs marked by color, as in the M-
Turk experiment) and, in addition, displayed all par-
ticipant types of the relevant script (i.e., the script as-
sociated with the text in the InScript corpus). Anno-
tators did not see the “correct” target NP. We asked
annotators to either (1) select the participant type in-
stantiated by the NP (if any), (2) label the NP as un-
related to the script, or (3), link the NP to an overt
antecedent in the text, in the case that the NP is ac-
tually a second mention that had been erroneously
labeled as new by the worker. Option (1) provides
a basis for a fine-grained estimation of first-mention
DRs. Option (3), which we added when we noticed
the considerable number of overlooked antecedents,
serves as correction of the results of the M-Turk ex-
periment. Out of the 22K annotated “New” cases,
39% were identified as second mentions, 55% were
linked to a participant type, and 6% were classified
as really novel.
4 Referent Prediction Model
In this section, we describe the model we use to
predict upcoming discourse referents (DRs).
4.1 Model
Our model should not only assign probabilities
to DRs already explicitly introduced in the preced-
ing text fragment (e.g., “bath” or “bathroom” for the
cloze task in Figure 2) but also reserve some prob-
ability mass for ‘new’ DRs, i.e., DRs activated via
the script context or completely novel ones not be-
longing to the script. In principle, different variants
of the activation mechanism must be distinguished.
For many participant types, a single participant be-
longing to a specific semantic class is expected (re-
ferred to with the bathtub or the soap). In contrast,
the “towel’ participant type may activate a set of ob-
jects, elements of which then can be referred to with
a towel or another towel. The “bath means” partici-
pant type may even activate a group of DRs belong-
ing to different semantic classes (e.g., bubble bath
and salts). Since it is not feasible to enumerate all
potential participants, for ‘new’ DRs we only pre-
dict their participant type (“bath means” in our ex-
ample). In other words, the number of categories
in our model is equal to the number of previously
introduced DRs plus the number of participant types
of the script plus 1, reserved for a new DR not corre-
sponding to any script participant (e.g., cellphone).
In what follows, we slightly abuse the terminology
and refer to all these categories as discourse refer-
ents.
Unlike standard co-reference models, which pre-
dict co-reference chains relying on the entire docu-
ment, our model is incremental, that is, when pre-
dicting a discourse referent d
(t)
at a given position
t, it can look only in the history h
(t)
(i.e., the pre-
ceding part of the document), excluding the refer-
ring expression (RE) for the predicted DR. We also
assume that past REs are correctly resolved and as-
signed to correct participant types (PTs). Typical
NLP applications use automatic coreference reso-
lution systems, but since we want to model human
behavior, this might be inappropriate, since an au-
tomated system would underestimate human perfor-
mance. This may be a strong assumption, but for
reasons explained above, we use gold standard past
REs.
We use the following log-linear model (“softmax
regression”):
p(d
(t)
= d|h
(t)
) =
exp(w
T
f(d, h
(t)
))
P
d
0
exp(w
T
f(d
0
, h
(t)
))
,
where f is the feature function we will discuss in
the following subsection, w are model parameters,
and the summation in the denominator is over the
35

Citations
More filters
Proceedings ArticleDOI

SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge

TL;DR: This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge, where the best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.
Proceedings ArticleDOI

Neural Text Generation in Stories Using Entity Representations as Context

TL;DR: The authors introduce an approach to neural text generation that explicitly represents entities mentioned in the text and demonstrate that modeling entities offers a benefit in two automatic evaluations: mention generation (in which a model chooses which entity to mention next and which words to use in the mention) and selection between a correct next sentence and a distractor from later in the same story.
Proceedings ArticleDOI

Dynamic Entity Representations in Neural Language Models

TL;DR: This article present a new type of language model, EntityNLM, that can explicitly model entities, dynamically update their representations, and contextually generate their mentions, which can be used for several different tasks such as language modeling, coreference resolution, and entity prediction.
Journal ArticleDOI

Argumentation mining: How can a machine acquire common sense and world knowledge?

TL;DR: A number of ways are proposed to improve the learning of common sense and world knowledge by exploiting textual and visual data, and touch upon how to integrate the learned knowledge in the argumentation mining process.
Proceedings ArticleDOI

Implicit Argument Prediction with Event Knowledge

TL;DR: This work proposes to train models for implicit argument prediction on a simple cloze task, for which data can be generated automatically at scale, and uses a neural model, which draws on narrative coherence and entity salience for predictions.
References
More filters
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Proceedings Article

Recurrent neural network based language model

TL;DR: Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Proceedings Article

Translating Embeddings for Modeling Multi-relational Data

TL;DR: TransE is proposed, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities, which proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases.
Journal ArticleDOI

A limited memory algorithm for bound constrained optimization

TL;DR: An algorithm for solving large nonlinear optimization problems with simple bounds is described, based on the gradient projection method and uses a limited memory BFGS matrix to approximate the Hessian of the objective function.
Related Papers (5)