scispace - formally typeset
Open AccessJournal ArticleDOI

An amorphous model for morphological processing in visual comprehension based on naive discriminative learning.

TLDR
A 2-layer symbolic network model based on the equilibrium equations of the Rescorla-Wagner model (Danks, 2003) is proposed, showing that for pseudo-derived words no special morpho-orthographic segmentation mechanism is required and predicting that productive affixes afford faster response latencies for new words.
Abstract
A 2-layer symbolic network model based on the equilibrium equations of the Rescorla-Wagner model (Danks, 2003) is proposed. The study first presents 2 experiments in Serbian, which reveal for sentential reading the inflectional paradigmatic effects previously observed by Milin, Filipovic Đurđevic, and Moscoso del Prado Martin (2009) for unprimed lexical decision. The empirical results are successfully modeled without having to assume separate representations for inflections or data structures such as inflectional paradigms. In the next step, the same naive discriminative learning approach is pitted against a wide range of effects documented in the morphological processing literature. Frequency effects for complex words as well as for phrases (Arnon & Snider, 2010) emerge in the model without the presence of whole-word or whole-phrase representations. Family size effects (Moscoso del Prado Martin, Bertram, Haikio, Schreuder, & Baayen, 2004; Schreuder & Baayen, 1997) emerge in the simulations across simple words, derived words, and compounds, without derived words or compounds being represented as such. It is shown that for pseudo-derived words no special morpho-orthographic segmentation mechanism, as posited by Rastle, Davis, and New (2004), is required. The model also replicates the finding of Plag and Baayen (2009) that, on average, words with more productive affixes elicit longer response latencies; at the same time, it predicts that productive affixes afford faster response latencies for new words. English phrasal paradigmatic effects modulating isolated word reading are reported and modeled, showing that the paradigmatic effects characterizing Serbian case inflection have crosslinguistic scope.

read more

Content maybe subject to copyright    Report

An amorphous model for morphological processing in visual
comprehension based on naive discriminative learning
R. Harald Baayen
University of Alberta
Petar Milin
University of Novi Sad
Laboratory for Experimental Psychology, University of Belgrade
Dusica Filipovi´c Durdevi´c
University of Novi Sad
Laboratory for Experimental Psychology, University of Belgrade
Peter Hendrix
University of Alberta
Marco Marelli
University of Milano-Bicocca
Abstract
A two-layer symbolic network model based on the equilibrium equations of
the Rescorla-Wagner model (Danks, 2003) is proposed. The study starts by
presenting two experiments in Serbian, which reveal for sentential reading
the inflectional paradigmatic effects previously observed by Milin, Filipovi´c
Durdevi´c, and Moscoso del Prado Mart´ın (2009) for unprimed lexical de-
cision. The empirical results are successfully modeled without having to
assume separate representations for inflections or data structures such as in-
flectional paradigms. In the next step, the same naive discriminative learn-
ing approach is pitted against a wide range of effects documented in the
morphological processing literature. Frequency effects for complex words as
well as for phrases (Arnon & Snider, 2010) emerge in the model without
the presence of whole-word or whole-phrase representations. Family size
effects (Schreuder & Baayen, 1997; Moscoso del Prado Mart´ın, Bertram,
H
¨
aiki
¨
o, Schreuder, & Baayen, 2004) emerge in the simulations across simple
words, derived words, and compounds, without derived words or compounds
being represented as such. It is shown that for pseudo-derived words no
special morpho-orthographic segmentation mechanism as posited by Rastle,
Davis, and New (2004) is required. The model also replicates the finding of
Plag and Baayen (2009), that, on average, words with more productive af-
fixes elicit longer response latencies, while at the same time predicting that
productive affixes afford faster response latencies for new words. English
phrasal paradigmatic effects modulating isolated word reading are reported
and modelled, showing that the paradigmatic effects characterizing Serbian
case inflection have cross-linguistic scope.
Keywords: naive discriminative learning, morphological processing, read-
ing, compound cue theory, Rescorla-Wagner equations, weighted relative
entropy, a-morphous morphology.

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 2
In traditional views of morphology, just as simple words consist of phonemes, complex words
are composed of discrete morphemes. In this view, morphemes are signs linking form to
meaning. A word such as goodness is analysed as consisting of two signs, the free morpheme
good, and the bound morpheme -ness. When reading goodness, the constituents good and
-ness are parsed out, and subsequently the meaning of the whole word, “the quality of
being good” (in any of the various senses of good) is computed from the meanings of the
constituent morphemes.
The morphemic view has been very influential in psycholinguistic studies of morpho-
logical processing. Many studies have addressed the question of whether the parsing of
a complex word into its constituents is an obligatory and automatic process (e.g., Taft &
Forster, 1975; Taft, 2004; Rastle et al., 2004) and have investigated the consequences of
such obligatory decomposition for words that are not morphologically complex (e.g., cor-
ner versus walk-er, reindeer (not re-in-de-er) versus re-in-state). Priming manipulations
have been used extensively to show that morphological effects are stronger than would be
expected from form or meaning overlap alone (e.g., Feldman, 2000). Other studies have
addressed the consequences of the breakdown of compositionality, both for derived words
business (’company’, not ‘the quality of being busy’) and compounds (hogwash, ‘nonsense’)
(see, e.g., Marslen-Wilson, Tyler, Waksler, & Older, 1994; Libben, Gibson, Yoon, & Sandra,
2003; Schreuder, Burani, & Baayen, 2003). Furthermore, frequency effects have often been
used as diagnostics for the existence of representations, with whole-word frequency effects
providing evidence for representations for complex words, and morphemic frequency effects
pointing to morpheme-specific representations (e.g., Taft & Forster, 1976a; Taft, 1979, 1994;
Baayen, Dijkstra, & Schreuder, 1997).
In current theoretical morphology, however, the morpheme does not play an important
role. One reason is that, contrary to what one would expect for a linguistic sign, bound
morphemes often express a range of very different meanings. In English, the formative -er is
used for deverbal nouns (walk-er) but also for comparatives (greater). The suffix -s indicates
plural on nouns (legs), singular on verbs (walks), and also the possessive (John’s legs). In
highly inflecting languages such as Serbian, the case ending -i indicates dative or locative
singular for regular feminine nouns (a class), but nominative plural for masculine nouns.
A second reason is that formatives often pack together several meanings, often only
semi-systematically. For instance, in Latin, the formatives for the present passive contain
an r as part of their form, but this r can appear initially (-r, -ris, first and second person
singular) or final (-tur, -mur, -ntur, third person singular, first and third person plural).
The exception is the formative for the second person plural, which does not contain an r
at all (-mini). Thus, the presence of an r in a verb ending is a good, although not perfect,
indicator of passive meaning. To complicate matters even further, the very same passive
formatives are used on selected verbs to express active instead of passive meaning, indicating
that the interpretation of these formatives is highly context-dependent. This is not what
We are indebted for comments, discussion, and feedback to two reviewers, Jorn and Corine Baayen, Jim
Blevins, Geert Booij, Paula Chesley, Victor Kuperman, Janet Pierrehumbert, Ingo Plag, Michael Ramscar,
and audiences at presentations in T
¨
ubingen, Tucson, Freiburg, Pisa, Siegen, Groningen, York, and San
Diego. This work was partially supported by the Ministry of Science and Environmental Protection of the
Republic of Serbia (grant number: 149039D).

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 3
one would expect if these formatives were bona fide linguistic signs.
A third reason is that some languages shamelessly reuse inflected forms as input
for further case inflections, as exemplified by Estonian non-nominative plural case endings
attaching to the partitive singular (Erelt, 2003). For instance, jalg (‘foot’, nominative) has
as singular case endings forms such as jalga (partitive), jala (genitive) and jalast (elative).
The corresponding plural case endings are jalad (nominative), jalgasid (partitive), jalgade
(genitive) and jalgadest (elative). Even though the form of the partitive singular is present
in the plural non-nominative case endings, it does not make any semantic contribution to
these plural forms (and therefore often analysed as a stem allomorph).
A fourth reason is that form-meaning relationships can be present without the need
of morphemic decomposition. Phonaesthemes, such as gl- in glow, glare, gloom, gleam,
glimmer and glint, provide one example, the initial wh of the question words of English
(who, why, which, whether, where, . . . ) provides another (Bloomfield, 1933). Furthermore,
blends (e.g., brunch, from breakfast and lunch) share aspects of compositionality without
allowing a normal parse (see, e.g., Gries, 2004, 2006).
A fifth reason is that inflectional formatives often express several grammatical mean-
ings simultaneously. For instance, the inflectional exponent a for Serbian regular feminine
nouns expresses either nominative and singular, or genitive and plural. Similarly, normal
signs such as tree may have various shades of meaning (such as ‘any perennial woody plant
of considerable size’, ‘a piece of timber’, ‘a cross’, ‘gallows’), but these different shades of
meaning are usually not intended simultaneously in the way that nominative and singular
(or genitive and plural) are expressed simultaneously by the a exponent.
A final reason is that in richly inflecting languages, the interpretation of an inflec-
tional formative depends on the inflectional paradigm of the base word it attaches to. For
instance, the abovementioned Serbian case ending -a can denote not only nominative sin-
gular or genitive plural for regular feminine nouns, but also genitive singular and plural for
regular masculine nouns. Moreover, for a subclass of masculine animate nouns, accusative
singular forms make use of the same exponent -a. The ambiguity of this case ending is
resolved, however, if one knows dative/instrumental/locative plural endings for feminine
and masculine nouns (-ama vs. -ima, respectively). In other words, resolving the ambiguity
of a case ending depends not only on contextual information in the preceding or following
discourse (syntagmatic information), but also on knowledge of the other inflected forms in
which a word can appear (paradigmatic information).
Considerations such as these suggest that the metaphor of morphology as a formal
calculus with morphemes as basic symbols, and morphological rules defining well-formed
strings as well as providing a semantic interpretation, much as a pocket calculator interprets
2 + 3 as 5, is inappropriate. Many studies of word formation have concluded that more
insightful analyses can be obtained by taking the word as the basic unit of morphological
analysis (for details, and more complex arguments against a beads-on-a-string model of
morphology (also known as ‘item-and-arrangement morphology’), see, e.g., Matthews, 1974;
Hockett, 1987; S. Anderson, 1992; Aronoff, 1994; Beard, 1995; Blevins, 2003, 2006; Booij,
2010).
The following quote from Hocket (1987:84) is informative, especially as in early work
Hockett himself had helped develop an ‘item-and-arrangement’ model of morphology that
he later regarded as inadequate:

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 4
In 1953 Floyd Lounsbury tried to tell us what we were doing with our clever
morphophonemic techniques. We were providing alternations by devising an
‘agglutinative analog’ of the language and formulating rules that would convert
expressions in that analog into the shapes in which they are actually uttered.
Of course, even such an agglutinative analog , with its accompanying conversion
rules, could be interpreted merely as a descriptive device. But it was not in
general taken that way; instead, it was taken as a direct reflection of reality.
We seemed to be convinced that, whatever might superficially appear to be the
case, every language is ‘really’ agglutinative.
It is worth noting that in a regular agglutinating language such as Turkish, morphological
formatives can be regarded as morphemes contributing their own meanings in a composi-
tional calculus. However, in order to understand morphological processing across human lan-
guages, a general algorithmic theory is required that covers both the many non-agglutinative
systems as well as more agglutinative-like systems.
If the trend in current linguistic morphology is moving in the right direction, the
questions of whether and how a complex word is decomposed during reading into its con-
stituent morphemes are not the optimal questions to pursue. A first relevant question
in ‘a-morphous’ approaches to morphological processing is how a complex word activates
the proper meanings, without necessarily assuming intermediate representations supposedly
negotiating between the orthographic input and semantics. A second important question
concerns the role of paradigmatic relations during lexical processing.
Of the many models proposed for morphological processing in the psycholinguistic
literature, the insights of a-morphous morphology fit best with aspects of the the trian-
gle model of Harm and Seidenberg (1999); Seidenberg and Gonnerman (2000); Plaut and
Gonnerman (2000); Harm and Seidenberg (2004). This connectionist model maps ortho-
graphic input units onto semantic units without intervening morphological units. The tri-
angle model also incorporates phonological knowledge, seeking to simulate reading aloud
within one unified system highly sensitive to the distributional properties of the input,
where other models posit two separate streams (orthography to meaning, and orthography
to phonology, see, e.g., Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; Borowsky et al.,
2006).
In what follows, we propose a computational model, the “naive discriminative reader”,
which models morphological processing with an architecture directly mapping form onto
meaning, without using specific representations for either bound morphemes or for complex
words. The model follows the triangle model, but differs in various ways. First, it works
with just two levels, orthography and meaning. In this study, we do not address reading
aloud, focusing instead on developing a model that properly predicts morphological effects
in comprehension. Second, there are no hidden layers mediating the mapping of form onto
meaning. Third, the representations that we use for coding the orthographic input and
semantic output are symbolic rather than subsymbolic. Fourth, our model makes use of
a simple algorithm based on discriminative learning to efficiently estimate the weights on
the connections from form to meaning, instead of backpropagation. The research strategy
pursued in the present study is to formulate the simplest probabilistic architecture that is
sufficiently powerful to predict the kind of morphological effects documented in the process-
ing literature.

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 5
Of special interest to our modeling effort are two general classes of phenomena that
suggest a form of ‘entanglement’ of words with morphologically related words during lex-
ical processing. Schreuder and Baayen (1997) documented for simple words that the type
count of morphologically related words co-determines processing latencies in visual lexi-
cal decision. This ‘family size’ effect has been replicated for complex words and emerges
also in languages such as Hebrew and Finnish (De Jong, Schreuder, & Baayen, 2000;
Moscoso del Prado Mart´ın, Kosti´c, & Baayen, 2004; Moscoso del Prado Mart´ın et al., 2005;
Moscoso del Prado Mart´ın et al., 2004; Baayen, 2010). One interpretation of the family
size effect, formulated within the framework of the multiple read-out model of Grainger and
Jacobs (1996), assumes that a word with a large family co-activates many family members,
thereby creating more lexical activity and hence providing more evidence for a yes-response
in lexical decision. Another explanation assumes that resonance within the network of fam-
ily members boosts the activation of the input word (De Jong, Schreuder, & Baayen, 2003).
In the present study, we pursue a third explanation, following Moscoso del Prado Mart´ın
(2003, chapter 10), according to which family size effects can emerge straightforwardly in
networks mapping forms onto meanings.
The second class of phenomena of interest to us revolves around the processing of
inflected words that enter into extensive, highly structured paradigmatic relations with
other inflected words. Milin, Filipovi´c Durdevi´c, and Moscoso del Prado Mart´ın (2009)
showed, for Serbian nouns inflected for case and number, that response latencies in the
visual lexical decision task are co-determined by both the probabilities of a word’s other
case endings, and the probabilities of these case endings in that word’s inflectional class.
More precisely, the more a given word’s probability distribution of case inflections differs
from the corresponding distribution of its inflectional class, the longer response latencies
are.
There are two main options for understanding these results. Under one interpreta-
tion, case-inflected variants are stored in memory, with computations over paradigmatically
structured sets of exemplars giving rise to the observed effects. This explanation is extremely
costly in the number of lexical representations that have to be assumed to be available in
memory. We therefore pursue a different explanation, one that is extremely parsimonious
in the number of representations required. We will show that these paradigmatic effects can
arise in a simple discriminative network associating forms with meanings. Crucially, the
network does not contain any representations for complex words the network embodies
a fully compositional probabilistic memory activating meanings given forms.
Although in generative grammar, morphology and syntax have been strictly separated
(for an exception, see, e.g., Lieber, 1992), approaches within the general framework of
construction grammar (Goldberg, 2006; Booij, 2005, 2009; Dabrowska, 2009; Booij, 2010)
view the distinction between morphology and syntax as gradient. In this framework, the
grammar is an inventory of constructions relating form to meaning. From a structural
perspective, morphological constructions differ from phrasal or syntactic constructions only
in lesser internal complexity. From a processing perspective, morphological constructions,
being smaller, should be more likely to leave traces in memory than syntactic constructions.
However, at the boundary, similar familiarity effects due to past experience are predicted
to arise for both larger complex words and smaller word n-grams. Interestingly, frequency
effects have been established not only for (regular) morphologically complex words (see,

Figures
Citations
More filters
Journal ArticleDOI

Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.
Journal ArticleDOI

An integrated theory of language production and comprehension

TL;DR: It is asserted that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other.
Journal ArticleDOI

Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning:

TL;DR: It is proposed that principles and techniques from the field of machine learning can help psychology become a more predictive science and an increased focus on prediction, rather than explanation, can ultimately lead to greater understanding of behavior.
Journal ArticleDOI

SUBTLEX-UK: A new and improved word frequency database for British English

TL;DR: A new measure of word frequency, the Zipf scale, is introduced, which the authors hope will stop the current misunderstandings of the word frequency effect.
Journal ArticleDOI

Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting : A review and empirical validation

TL;DR: It is argued that a new class of prediction-based models that are trained on a text corpus and that measure semantic similarity between words bridge the gap between traditional approaches to distributional semantics and psychologically plausible learning principles.
References
More filters
Journal ArticleDOI

A mathematical theory of communication

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Journal ArticleDOI

Mixed-effects modeling with crossed random effects for subjects and items

TL;DR: In this article, the authors provide an introduction to mixed-effects models for the analysis of repeated measurement data with subjects and items as crossed random effects, and a worked-out example of how to use recent software for mixed effects modeling is provided.
Book

Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

TL;DR: In this article, the authors present a method for detecting and assessing Collinearity of observations and outliers in the context of extensions to the Wikipedia corpus, based on the concept of Influential Observations.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What are the contributions mentioned in the paper "An amorphous model for morphological processing in visual comprehension based on naive discriminative learning" ?

The study starts by presenting two experiments in Serbian, which reveal for sentential reading the inflectional paradigmatic effects previously observed by Milin, Filipović Durdević, and Moscoso del Prado Mart́ın ( 2009 ) for unprimed lexical decision. 

Nevertheless, the authors think it is worth considering that the simpler explanation may be on the right track. One of the central questions for the cognition of language that they put forward is whether the very different language systems of the world can be acquired by the same general learning strategies ( p. 447 ). Finally, even for responses in visual lexical decision, the naive discriminative reader provides a high-level characterization of contextual learning that at the level of cortical learning may be more adequately modeled by hierarchical temporal memory systems ( Hawkins & Blakeslee, 2004 ; Numenta, 2010 ). 

Due to greater activation of its lexical meaning (and its grammatical meanings), the response latency to a longer word is predicted to be shorter. 

All the authors need for modeling morphological effects is a (symbolic) layer of orthographic nodes (unigrams and bigrams) and a (symbolic) layer of meanings. 

Trial was the only predictor for which by-participant random slopes (for the quadratic term of Trial only) were supported by a likelihood ratio test. 

low token frequencies and many types lead to reduced item-specific learning, with as flip side better generalisation to previously unseen words. 

The naive discriminative reader predicts that bigram troughs also should give rise to shorter response latencies, but not because morphological decomposition would pro-ceed more effectively. 

Orthographic familiarity has a significant (albeit small) facilitatory effect on several reading time measures, independently of word-frequency effects. 

When a marble is drawn from the vase without replacement, the likelihood that its color occurs once only is equal to the ratio of the number of colors with frequency 1 (V1) to the total number of marbles (N), for the present example leading to the probability (2/40). 

Yet no fewer than 2,238,324,000,000 distances would have to be evaluated to estimate the posterior probabilities of just these phrases in the Bayesian Reader approach. 

Even for the small data set of Serbian nouns, the number of distances the Easy Bayesian Reader has to compute is already 15 times the number of weights that need to be set in the discriminative learning model. 

The naive discriminative reader is also sparse in the number of representations required: at the orthographic level, letter unigrams and bigrams, and at the semantic level, meaning representations for simple words, inflectional meanings such as case and number, and the meanings of derivational affixes. 

The increased processing costs for longer words are, in the present approach, the straightforward consequence of multiple fixations and saccades, a physiological factor unrelated to discriminative learning. 

The adjective fruitless is opaque when considered in isolation: the meaning ‘in vain’, ‘unprofitable’ seems unrelated to the meaning of the base, fruit. 

The naive discriminative learning framework, in which relative entropy effects emerge naturally, by contrast, imposes very limited demands on memory, and also does not require a separate process evaluating an exemplar’s distance to the prototype. 

Inflectional morphology tends to be quite regular (the irregular past tenses of English being exceptional), but derivational processes are characterized by degrees of productivity.