Why is the response latency predicted to be shorter?

Due to greater activation of its lexical meaning (and its grammatical meanings), the response latency to a longer word is predicted to be shorter.

What do the authors need for modeling morphological effects?

All the authors need for modeling morphological effects is a (symbolic) layer of orthographic nodes (unigrams and bigrams) and a (symbolic) layer of meanings.

What was the only predictor for which by-participant random slopes were supported?

Trial was the only predictor for which by-participant random slopes (for the quadratic term of Trial only) were supported by a likelihood ratio test.

What is the effect of low token frequencies and many types on item-specific learning?

low token frequencies and many types lead to reduced item-specific learning, with as flip side better generalisation to previously unseen words.

Why is the naive discriminative reader predicting shorter response latencies?

The naive discriminative reader predicts that bigram troughs also should give rise to shorter response latencies, but not because morphological decomposition would pro-ceed more effectively.

What is the effect of orthographic familiarity on the reading time measures?

Orthographic familiarity has a significant (albeit small) facilitatory effect on several reading time measures, independently of word-frequency effects.

What is the probability that a marble is drawn from the vase without replacement?

When a marble is drawn from the vase without replacement, the likelihood that its color occurs once only is equal to the ratio of the number of colors with frequency 1 (V1) to the total number of marbles (N), for the present example leading to the probability (2/40).

How many distances are needed to estimate the posterior probabilities of the discriminative learning model?

Yet no fewer than 2,238,324,000,000 distances would have to be evaluated to estimate the posterior probabilities of just these phrases in the Bayesian Reader approach.

How many weights are needed to compute the discriminative learning model?

Even for the small data set of Serbian nouns, the number of distances the Easy Bayesian Reader has to compute is already 15 times the number of weights that need to be set in the discriminative learning model.

What is the number of representations required for a naive discriminative reader?

The naive discriminative reader is also sparse in the number of representations required: at the orthographic level, letter unigrams and bigrams, and at the semantic level, meaning representations for simple words, inflectional meanings such as case and number, and the meanings of derivational affixes.

What is the effect of multiple fixations and saccades on the processing costs for longer words?

The increased processing costs for longer words are, in the present approach, the straightforward consequence of multiple fixations and saccades, a physiological factor unrelated to discriminative learning.

What is the meaning of the adjective fruitless?

The adjective fruitless is opaque when considered in isolation: the meaning ‘in vain’, ‘unprofitable’ seems unrelated to the meaning of the base, fruit.

What does the naive discriminative learning framework require to evaluate?

The naive discriminative learning framework, in which relative entropy effects emerge naturally, by contrast, imposes very limited demands on memory, and also does not require a separate process evaluating an exemplar’s distance to the prototype.

What is the degree of productivity of the derivational process?

Inflectional morphology tends to be quite regular (the irregular past tenses of English being exceptional), but derivational processes are characterized by degrees of productivity.

(Open Access) An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. (2011) | R. Harald Baayen

Q: What are the future works mentioned in the paper "An amorphous model for morphological processing in visual comprehension based on naive discriminative learning" ?

Nevertheless, the authors think it is worth considering that the simpler explanation may be on the right track. One of the central questions for the cognition of language that they put forward is whether the very different language systems of the world can be acquired by the same general learning strategies ( p. 447 ). Finally, even for responses in visual lexical decision, the naive discriminative reader provides a high-level characterization of contextual learning that at the level of cortical learning may be more adequately modeled by hierarchical temporal memory systems ( Hawkins & Blakeslee, 2004 ; Numenta, 2010 ).

An amorphous model for morphological processing in visual

comprehension based on naive discriminative learning

R. Harald Baayen

University of Alberta

Petar Milin

University of Novi Sad

Laboratory for Experimental Psychology, University of Belgrade

Dusica Filipovi´c Durdevi´c

University of Novi Sad

Laboratory for Experimental Psychology, University of Belgrade

Peter Hendrix

University of Alberta

Marco Marelli

University of Milano-Bicocca

Abstract

A two-layer symbolic network model based on the equilibrium equations of

the Rescorla-Wagner model (Danks, 2003) is proposed. The study starts by

presenting two experiments in Serbian, which reveal for sentential reading

the inﬂectional paradigmatic eﬀects previously observed by Milin, Filipovi´c

Durdevi´c, and Moscoso del Prado Mart´ın (2009) for unprimed lexical de-

cision. The empirical results are successfully modeled without having to

assume separate representations for inﬂections or data structures such as in-

ﬂectional paradigms. In the next step, the same naive discriminative learn-

ing approach is pitted against a wide range of eﬀects documented in the

morphological processing literature. Frequency eﬀects for complex words as

well as for phrases (Arnon & Snider, 2010) emerge in the model without

the presence of whole-word or whole-phrase representations. Family size

eﬀects (Schreuder & Baayen, 1997; Moscoso del Prado Mart´ın, Bertram,

aiki

o, Schreuder, & Baayen, 2004) emerge in the simulations across simple

words, derived words, and compounds, without derived words or compounds

being represented as such. It is shown that for pseudo-derived words no

special morpho-orthographic segmentation mechanism as posited by Rastle,

Davis, and New (2004) is required. The model also replicates the ﬁnding of

Plag and Baayen (2009), that, on average, words with more productive af-

ﬁxes elicit longer response latencies, while at the same time predicting that

productive aﬃxes aﬀord faster response latencies for new words. English

phrasal paradigmatic eﬀects modulating isolated word reading are reported

and modelled, showing that the paradigmatic eﬀects characterizing Serbian

case inﬂection have cross-linguistic scope.

Keywords: naive discriminative learning, morphological processing, read-

ing, compound cue theory, Rescorla-Wagner equations, weighted relative

entropy, a-morphous morphology.

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 2

In traditional views of morphology, just as simple words consist of phonemes, complex words

are composed of discrete morphemes. In this view, morphemes are signs linking form to

meaning. A word such as goodness is analysed as consisting of two signs, the free morpheme

good, and the bound morpheme -ness. When reading goodness, the constituents good and

-ness are parsed out, and subsequently the meaning of the whole word, “the quality of

being good” (in any of the various senses of good) is computed from the meanings of the

constituent morphemes.

The morphemic view has been very inﬂuential in psycholinguistic studies of morpho-

logical processing. Many studies have addressed the question of whether the parsing of

a complex word into its constituents is an obligatory and automatic process (e.g., Taft &

Forster, 1975; Taft, 2004; Rastle et al., 2004) and have investigated the consequences of

such obligatory decomposition for words that are not morphologically complex (e.g., cor-

ner versus walk-er, reindeer (not re-in-de-er) versus re-in-state). Priming manipulations

have been used extensively to show that morphological eﬀects are stronger than would be

expected from form or meaning overlap alone (e.g., Feldman, 2000). Other studies have

addressed the consequences of the breakdown of compositionality, both for derived words

business (’company’, not ‘the quality of being busy’) and compounds (hogwash, ‘nonsense’)

(see, e.g., Marslen-Wilson, Tyler, Waksler, & Older, 1994; Libben, Gibson, Yoon, & Sandra,

2003; Schreuder, Burani, & Baayen, 2003). Furthermore, frequency eﬀects have often been

used as diagnostics for the existence of representations, with whole-word frequency eﬀects

providing evidence for representations for complex words, and morphemic frequency eﬀects

pointing to morpheme-speciﬁc representations (e.g., Taft & Forster, 1976a; Taft, 1979, 1994;

Baayen, Dijkstra, & Schreuder, 1997).

In current theoretical morphology, however, the morpheme does not play an important

role. One reason is that, contrary to what one would expect for a linguistic sign, bound

morphemes often express a range of very diﬀerent meanings. In English, the formative -er is

used for deverbal nouns (walk-er) but also for comparatives (greater). The suﬃx -s indicates

plural on nouns (legs), singular on verbs (walks), and also the possessive (John’s legs). In

highly inﬂecting languages such as Serbian, the case ending -i indicates dative or locative

singular for regular feminine nouns (a class), but nominative plural for masculine nouns.

A second reason is that formatives often pack together several meanings, often only

semi-systematically. For instance, in Latin, the formatives for the present passive contain

an r as part of their form, but this r can appear initially (-r, -ris, ﬁrst and second person

singular) or ﬁnal (-tur, -mur, -ntur, third person singular, ﬁrst and third person plural).

The exception is the formative for the second person plural, which does not contain an r

at all (-mini). Thus, the presence of an r in a verb ending is a good, although not perfect,

indicator of passive meaning. To complicate matters even further, the very same passive

formatives are used on selected verbs to express active instead of passive meaning, indicating

that the interpretation of these formatives is highly context-dependent. This is not what

We are indebted for comments, discussion, and feedback to two reviewers, Jorn and Corine Baayen, Jim

Blevins, Geert Booij, Paula Chesley, Victor Kuperman, Janet Pierrehumbert, Ingo Plag, Michael Ramscar,

and audiences at presentations in T

ubingen, Tucson, Freiburg, Pisa, Siegen, Groningen, York, and San

Diego. This work was partially supported by the Ministry of Science and Environmental Protection of the

Republic of Serbia (grant number: 149039D).

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 3

one would expect if these formatives were bona ﬁde linguistic signs.

A third reason is that some languages shamelessly reuse inﬂected forms as input

for further case inﬂections, as exempliﬁed by Estonian non-nominative plural case endings

attaching to the partitive singular (Erelt, 2003). For instance, jalg (‘foot’, nominative) has

as singular case endings forms such as jalga (partitive), jala (genitive) and jalast (elative).

The corresponding plural case endings are jalad (nominative), jalgasid (partitive), jalgade

(genitive) and jalgadest (elative). Even though the form of the partitive singular is present

in the plural non-nominative case endings, it does not make any semantic contribution to

these plural forms (and therefore often analysed as a stem allomorph).

A fourth reason is that form-meaning relationships can be present without the need

of morphemic decomposition. Phonaesthemes, such as gl- in glow, glare, gloom, gleam,

glimmer and glint, provide one example, the initial wh of the question words of English

(who, why, which, whether, where, . . . ) provides another (Bloomﬁeld, 1933). Furthermore,

blends (e.g., brunch, from breakfast and lunch) share aspects of compositionality without

allowing a normal parse (see, e.g., Gries, 2004, 2006).

A ﬁfth reason is that inﬂectional formatives often express several grammatical mean-

ings simultaneously. For instance, the inﬂectional exponent a for Serbian regular feminine

nouns expresses either nominative and singular, or genitive and plural. Similarly, normal

signs such as tree may have various shades of meaning (such as ‘any perennial woody plant

of considerable size’, ‘a piece of timber’, ‘a cross’, ‘gallows’), but these diﬀerent shades of

meaning are usually not intended simultaneously in the way that nominative and singular

(or genitive and plural) are expressed simultaneously by the a exponent.

A ﬁnal reason is that in richly inﬂecting languages, the interpretation of an inﬂec-

tional formative depends on the inﬂectional paradigm of the base word it attaches to. For

instance, the abovementioned Serbian case ending -a can denote not only nominative sin-

gular or genitive plural for regular feminine nouns, but also genitive singular and plural for

regular masculine nouns. Moreover, for a subclass of masculine animate nouns, accusative

singular forms make use of the same exponent -a. The ambiguity of this case ending is

resolved, however, if one knows dative/instrumental/locative plural endings for feminine

and masculine nouns (-ama vs. -ima, respectively). In other words, resolving the ambiguity

of a case ending depends not only on contextual information in the preceding or following

discourse (syntagmatic information), but also on knowledge of the other inﬂected forms in

which a word can appear (paradigmatic information).

Considerations such as these suggest that the metaphor of morphology as a formal

calculus with morphemes as basic symbols, and morphological rules deﬁning well-formed

strings as well as providing a semantic interpretation, much as a pocket calculator interprets

2 + 3 as 5, is inappropriate. Many studies of word formation have concluded that more

insightful analyses can be obtained by taking the word as the basic unit of morphological

analysis (for details, and more complex arguments against a beads-on-a-string model of

morphology (also known as ‘item-and-arrangement morphology’), see, e.g., Matthews, 1974;

Hockett, 1987; S. Anderson, 1992; Aronoﬀ, 1994; Beard, 1995; Blevins, 2003, 2006; Booij,

2010).

The following quote from Hocket (1987:84) is informative, especially as in early work

Hockett himself had helped develop an ‘item-and-arrangement’ model of morphology that

he later regarded as inadequate:

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 4

In 1953 Floyd Lounsbury tried to tell us what we were doing with our clever

morphophonemic techniques. We were providing alternations by devising an

‘agglutinative analog’ of the language and formulating rules that would convert

expressions in that analog into the shapes in which they are actually uttered.

Of course, even such an agglutinative analog , with its accompanying conversion

rules, could be interpreted merely as a descriptive device. But it was not in

general taken that way; instead, it was taken as a direct reﬂection of reality.

We seemed to be convinced that, whatever might superﬁcially appear to be the

case, every language is ‘really’ agglutinative.

It is worth noting that in a regular agglutinating language such as Turkish, morphological

formatives can be regarded as morphemes contributing their own meanings in a composi-

tional calculus. However, in order to understand morphological processing across human lan-

guages, a general algorithmic theory is required that covers both the many non-agglutinative

systems as well as more agglutinative-like systems.

If the trend in current linguistic morphology is moving in the right direction, the

questions of whether and how a complex word is decomposed during reading into its con-

stituent morphemes are not the optimal questions to pursue. A ﬁrst relevant question

in ‘a-morphous’ approaches to morphological processing is how a complex word activates

the proper meanings, without necessarily assuming intermediate representations supposedly

negotiating between the orthographic input and semantics. A second important question

concerns the role of paradigmatic relations during lexical processing.

Of the many models proposed for morphological processing in the psycholinguistic

literature, the insights of a-morphous morphology ﬁt best with aspects of the the trian-

gle model of Harm and Seidenberg (1999); Seidenberg and Gonnerman (2000); Plaut and

Gonnerman (2000); Harm and Seidenberg (2004). This connectionist model maps ortho-

graphic input units onto semantic units without intervening morphological units. The tri-

angle model also incorporates phonological knowledge, seeking to simulate reading aloud

within one uniﬁed system highly sensitive to the distributional properties of the input,

where other models posit two separate streams (orthography to meaning, and orthography

to phonology, see, e.g., Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001; Borowsky et al.,

2006).

In what follows, we propose a computational model, the “naive discriminative reader”,

which models morphological processing with an architecture directly mapping form onto

meaning, without using speciﬁc representations for either bound morphemes or for complex

words. The model follows the triangle model, but diﬀers in various ways. First, it works

with just two levels, orthography and meaning. In this study, we do not address reading

aloud, focusing instead on developing a model that properly predicts morphological eﬀects

in comprehension. Second, there are no hidden layers mediating the mapping of form onto

meaning. Third, the representations that we use for coding the orthographic input and

semantic output are symbolic rather than subsymbolic. Fourth, our model makes use of

a simple algorithm based on discriminative learning to eﬃciently estimate the weights on

the connections from form to meaning, instead of backpropagation. The research strategy

pursued in the present study is to formulate the simplest probabilistic architecture that is

suﬃciently powerful to predict the kind of morphological eﬀects documented in the process-

ing literature.

MORPHOLOGICAL PROCESSING WITH DISCRIMINATIVE LEARNING 5

Of special interest to our modeling eﬀort are two general classes of phenomena that

suggest a form of ‘entanglement’ of words with morphologically related words during lex-

ical processing. Schreuder and Baayen (1997) documented for simple words that the type

count of morphologically related words co-determines processing latencies in visual lexi-

cal decision. This ‘family size’ eﬀect has been replicated for complex words and emerges

also in languages such as Hebrew and Finnish (De Jong, Schreuder, & Baayen, 2000;

Moscoso del Prado Mart´ın, Kosti´c, & Baayen, 2004; Moscoso del Prado Mart´ın et al., 2005;

Moscoso del Prado Mart´ın et al., 2004; Baayen, 2010). One interpretation of the family

size eﬀect, formulated within the framework of the multiple read-out model of Grainger and

Jacobs (1996), assumes that a word with a large family co-activates many family members,

thereby creating more lexical activity and hence providing more evidence for a yes-response

in lexical decision. Another explanation assumes that resonance within the network of fam-

ily members boosts the activation of the input word (De Jong, Schreuder, & Baayen, 2003).

In the present study, we pursue a third explanation, following Moscoso del Prado Mart´ın

(2003, chapter 10), according to which family size eﬀects can emerge straightforwardly in

networks mapping forms onto meanings.

The second class of phenomena of interest to us revolves around the processing of

inﬂected words that enter into extensive, highly structured paradigmatic relations with

other inﬂected words. Milin, Filipovi´c Durdevi´c, and Moscoso del Prado Mart´ın (2009)

showed, for Serbian nouns inﬂected for case and number, that response latencies in the

visual lexical decision task are co-determined by both the probabilities of a word’s other

case endings, and the probabilities of these case endings in that word’s inﬂectional class.

More precisely, the more a given word’s probability distribution of case inﬂections diﬀers

from the corresponding distribution of its inﬂectional class, the longer response latencies

are.

There are two main options for understanding these results. Under one interpreta-

tion, case-inﬂected variants are stored in memory, with computations over paradigmatically

structured sets of exemplars giving rise to the observed eﬀects. This explanation is extremely

costly in the number of lexical representations that have to be assumed to be available in

memory. We therefore pursue a diﬀerent explanation, one that is extremely parsimonious

in the number of representations required. We will show that these paradigmatic eﬀects can

arise in a simple discriminative network associating forms with meanings. Crucially, the

network does not contain any representations for complex words — the network embodies

a fully compositional probabilistic memory activating meanings given forms.

Although in generative grammar, morphology and syntax have been strictly separated

(for an exception, see, e.g., Lieber, 1992), approaches within the general framework of

construction grammar (Goldberg, 2006; Booij, 2005, 2009; Dabrowska, 2009; Booij, 2010)

view the distinction between morphology and syntax as gradient. In this framework, the

grammar is an inventory of constructions relating form to meaning. From a structural

perspective, morphological constructions diﬀer from phrasal or syntactic constructions only

in lesser internal complexity. From a processing perspective, morphological constructions,

being smaller, should be more likely to leave traces in memory than syntactic constructions.

However, at the boundary, similar familiarity eﬀects due to past experience are predicted

to arise for both larger complex words and smaller word n-grams. Interestingly, frequency

eﬀects have been established not only for (regular) morphologically complex words (see,

An amorphous model for morphological processing in visual comprehension based on naive discriminative learning.

Figures

Citations

Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

An integrated theory of language production and comprehension

Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning:

SUBTLEX-UK: A new and improved word frequency database for British English

Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting : A review and empirical validation

References

A mathematical theory of communication

Binary codes capable of correcting deletions, insertions and reversals

Binary codes capable of correcting deletions, insertions, and reversals

Mixed-effects modeling with crossed random effects for subjects and items

Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

Related Papers (5)

A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement

The broth in my brother's brothel: morpho-orthographic segmentation in visual word recognition.

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

The English Lexicon Project.

Mixed-effects modeling with crossed random effects for subjects and items

Frequently Asked Questions (16)

Q1. What are the contributions mentioned in the paper "An amorphous model for morphological processing in visual comprehension based on naive discriminative learning" ?

Q2. What are the future works mentioned in the paper "An amorphous model for morphological processing in visual comprehension based on naive discriminative learning" ?

Q3. Why is the response latency predicted to be shorter?

Q4. What do the authors need for modeling morphological effects?

Q5. What was the only predictor for which by-participant random slopes were supported?

Q6. What is the effect of low token frequencies and many types on item-specific learning?

Q7. Why is the naive discriminative reader predicting shorter response latencies?

Q8. What is the effect of orthographic familiarity on the reading time measures?

Q9. What is the probability that a marble is drawn from the vase without replacement?

Q10. How many distances are needed to estimate the posterior probabilities of the discriminative learning model?

Q11. How many weights are needed to compute the discriminative learning model?

Q12. What is the number of representations required for a naive discriminative reader?

Q13. What is the effect of multiple fixations and saccades on the processing costs for longer words?

Q14. What is the meaning of the adjective fruitless?

Q15. What does the naive discriminative learning framework require to evaluate?

Q16. What is the degree of productivity of the derivational process?