Open AccessJournal ArticleDOI

Modelling the N400 brain potential as change in a probabilistic representation of meaning.

- 27 Aug 2018 -

- Vol. 2, Iss: 9, pp 693-705

TLDR

The authors provide a unified explanation of the N400 in a neural network model that avoids the commitments of traditional approaches to meaning in language and connects human language comprehension with recent deep learning approaches to language processing.

Abstract:

The N400 component of the event-related brain potential has aroused much interest because it is thought to provide an online measure of meaning processing in the brain. However, the underlying process remains incompletely understood and actively debated. Here we present a computationally explicit account of this process and the emerging representation of sentence meaning. We simulate N400 amplitudes as the change induced by an incoming stimulus in an implicit and probabilistic representation of meaning captured by the hidden unit activation pattern in a neural network model of sentence comprehension, and we propose that the process underlying the N400 also drives implicit learning in the network. The model provides a unified account of 16 distinct findings from the N400 literature and connects human language comprehension with recent deep learning approaches to language processing.

Content maybe subject to copyright Report

I like coffee with cream and dog? Change in an implicit

probabilistic representation captures meaning processing in the

brain

Milena Rabovsky*, Steven S. Hansen, & James L. McClelland*

Department of Psychology, Stanford University

Word count: 10275

*Corresponding authors:

Milena Rabovsky (milena.rabovsky@gmail.com)

James L. McClelland (

mcclelland@stanford.edu

)

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 14, 2018. ; https://doi.org/10.1101/138149doi: bioRxiv preprint

Abstract

The N400 component of the event-related brain potential has aroused much interest

because it is thought to provide an online measure of meaning processing in the brain. Yet,

the underlying process remains incompletely understood and actively debated. Here, we

present a computationally explicit account of this process and the emerging representation of

sentence meaning. We simulate N400 amplitudes as the change induced by an incoming

stimulus in an implicit and probabilistic representation of meaning captured by the hidden

unit activation pattern in a neural network model of sentence comprehension, and we propose

that the process underlying the N400 also drives implicit learning in the network. The model

provides a unified account of 16 distinct findings from the N400 literature and connects

human language processing with successful deep learning approaches to language processing.

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 14, 2018. ; https://doi.org/10.1101/138149doi: bioRxiv preprint

I like coffee with cream and dog? Change in an implicit probabilistic representation

captures meaning processing in the brain

The N400 component of the event-related brain potential (ERP) has received a great

deal of attention, as it promises to shed light on the brain basis of meaning processing. The

N400 is a negative deflection recorded over centro-parietal areas peaking around 400 ms after

the presentation of a potentially meaningful stimulus. The first report of the N400 showed that

it occurred on presentation of a word violating expectations established by context: given “I

take my coffee with cream and …” the anomalous word dog produces a larger N400 than the

congruent word sugar

. Since this study, the N400 has been used as a dependent variable in

over 1000 studies and has been shown to be modulated by a wide range of variables including

sentence context, category membership, repetition, and lexical frequency, amongst others

However, despite the large amount of data on the N400, its functional basis continues to be

debated: various competing verbal descriptive theories have been proposed

3–8

, but their

capacity to capture all the relevant data is difficult to determine unambiguously due to the

lack of implementation, and none has yet offered a generally accepted account

Here, we provide both support for and formalization of the view that the N400 reflects

the input-driven update of a representation of sentence meaning – one that implicitly and

probabilistically represents all aspects of meaning as it evolves in real time during

comprehension

. We do so by presenting an explicit computational model of this process.

The model is trained and tested using materials generated by a simplified artificial

microworld (see below) in which we can manipulate variables that have been shown to affect

the N400, allowing us to explore how these factors affect processing. The use of these

synthetic materials prevents us from simulating N400 responses to the specific sentences used

in empirical experiments. Nevertheless, using these artificial materials, we are able to show

that the model can capture the effects of a broad range of factors on N400 amplitudes.

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 14, 2018. ; https://doi.org/10.1101/138149doi: bioRxiv preprint

The model does not exactly correspond to any existing account of the N400, as it

implements a distinct perspective on language comprehension. Existing accounts are often

grounded, at least in part, in modes of theorizing based on constructs originating in the

1950’s

, in which symbolic representations (e.g., of the meanings of words) are retrieved from

memory and subsequently integrated into a compositional representation – an annotated

structural description thought to serve as the representation of the meaning of a sentence

10–12

Even though perspectives on language processing have evolved in a variety of ways, many

researchers maintain the notion that word meanings are first retrieved from memory and

subsequently assigned to roles in a compositional representation. The account we offer here

does not employ these constructs and thus may contribute to the effort to rethink aspects of

several foundational issues: What does it mean to understand language? What are the

component parts of the process? Do we construct a structural description of a spoken

utterance in our mind, or do we more directly construct a representation of the speaker’s

meaning? Our work suggests different answers than those often given to these questions.

Our model, called the Sentence Gestalt (SG) model, was initially developed 30 years

ago

13,14

with the goal of illustrating how language understanding might occur without relying

on the traditional mode of theorizing described above. The model sought to offer a

functional-level characterization of language understanding in which each word in a sentence

someone hears or reads provides clues that constrain the formation of an implicit

representation of the event or situation described by the sentence. The initial work with the

model

established that it could capture several core aspects of language, including the ability

to resolve ambiguities of several kinds; to use word order and semantic constraints

appropriately; and to represent events described by sentences never seen during the network’s

training. A subsequent model using a similar approach successfully mastered a considerably

more complex linguistic environment

. The current work extending the SG model to address

N400 amplitudes complements efforts to model neurophysiological details underlying the

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 14, 2018. ; https://doi.org/10.1101/138149doi: bioRxiv preprint

N400

16–18

The design of the model (Fig. 1) reflects the principle that listeners continually update

their representation of the situation or event being described as each incoming word of a

sentence is presented. The representation is an internal representation (putatively

corresponding to a pattern of neural activity, modeled in an artificial neural network) called

the sentence gestalt (SG) that depends on connection-based knowledge in the update part of

the network. The SG pattern can be used to guide responses to potential queries about the

event or situation being described by the sentence (see Implicit probabilistic theory of

meaning section in online methods). The model is trained with sentences and queries about

the events the sentences describe, so that it can, if probed, provide responses to such queries.

Although we focus on a very simple microworld of events and sentences that can describe

them, the model exemplifies a wider conception of a neural activation state that represents a

person’s subjective understanding of a broad range of situations and of the kinds of inputs that

can be used to update this understanding. The input could be in the form of language

expressing states of affairs (e.g., “Her hair is red.”) or even non-declarative language. For

example, the question “What time is it?” communicates that the speaker would like to know

the time. Though we focus only on linguistic input here, the input guiding the formation of

these representations could also come from witnessing events directly; from pictures, sounds,

or movies; or from any combination of linguistic or other forms of input.

The magnitude of the update produced by each successive word of a sentence

corresponds to the change in the model’s implicit representation that is produced by the word,

and it is this change, we propose, that is reflected in N400 amplitudes. Specifically, the

semantic update (SU) induced by the current word n is defined as the sum across the units in

the SG layer of the absolute value of the change in each unit’s activation produced by the

current word n. For a given unit (indexed below by the subscript i), the change is simply the

difference between the unit’s activation after word n and after word n-1:

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 14, 2018. ; https://doi.org/10.1101/138149doi: bioRxiv preprint

HTML Viewer

Figures

Figure 7. Comprehension performance and semantic update effects at a very early stage in training. Cong., congruent; incong., incongruent. a. Activation of selected output units while the model is presented with the sentence “The man plays chess.”. It can be seen that the model fails to activate the corresponding units at the output layer. The only thing that it has apparently learned at this point is which concepts correspond to possible agents, and it activates those in a way that is sensitive to their base rate frequencies (in the model’s environment, woman and man are more frequent than girl and boy; see online methods), and with a beginning tendency to activate the correct agent (“man”) most. b. Even at this low level of performance, there are robust effects of associative priming (t1(9) = 6.12, p = .00018, d = 1.94, 95% CI [.93, 2.03], t2(9) = 7.31, p < .0001, d = 2.31, 95% CI [1.02, 1.94], top), semantic congruity in sentences (t1(9) = 6.85, p < .0001, d = 2.16, 95% CI [.95, 1.90]; t2(9) = 5.74, p = .00028, d = 1.81, 95% CI [.86, 1.99], middle), and semantic priming (t1(9) = 5.39, p = .0004, d = 1.70, 95% CI [.35, .85], t2(9) = 3.79, p = .0043, d = 1.20, 95% CI [.24, .96], bottom), on the size of the semantic update, the model’s N400 correlate. Each blue dot represents the results for one independent run of the model, averaged across items per condition; the red dots represent the means for each condition, and red error bars represent +/- SEM.

Figure 4. Simulation of the influence of a change in normal word order. Change, changed word order; control, normal word order. Each blue dot represents the results for one independent run of the model, averaged across items per condition; the red dots represent the means for each condition, and red error bars represent +/- SEM. Semantic update was slightly larger for normal compared to changed word order; the main effect was significant over models, t1(9) = 5.94 , p = .0002, d = 1.88, 95% CI [.14, .31], but not over items, t2(9) = 1.56, p = .14, d = .39, 95% CI [-.08, .53].

Figure 5. Simulation of the influence of constraint for unexpected endings. Exp., expected; unex., unexpected; c., constraint. Each blue dot represents the results for one independent run of the model, averaged across items per condition; the red dots represent the means for each condition, and red error bars represent +/- SEM. Semantic update did not differ between unexpected high constraint endings and unexpected low constraint endings, t1(9) = 0.13, p = .90, d = .04, 95% CI [-.24, .27]; t2(9) = 0.12, p = .91, d = .04, 95% CI [-.27, .30], while for expected endings it was considerably lower than both, for unexpected high constraint, t1(9) = 25.00, p < .0001, d = 7.91, 95% CI [1.26, 1.52]; t2(9) = 11.24, p < .0001, d = 3.55, 95% CI [1.11, 1.67], and for unexpected low constraint endings, t1(9) = 10.21, p < .0001, d = 3.23, 95% CI [1.09, 1.72]; t2(9) = 23.33, p < .0001, d = 7.38, 95% CI [1.27, 1.54].

Figure 9. Simulation results from a simple recurrent network model (SRN) trained to predict the next word based on the preceding context. Each blue dot represents the results for one independent run of the model, averaged across items per condition; the red dots represent the means for each condition, and red error bars represent +/- SEM. Top left, reversal anomaly: t1(9) = 4.55, p = .0042, d = 1.44, 95% CI [.013, .038], t2(7) = 7.83, p = .0003, d = 2.77, 95% CI [.018, .033] for the comparison between congruent and reversal anomaly; t1(9) = 12.28, p < .0001, d = 3.87, 95% CI [.013, .019], t2(7) = 2.98, p = .062, d = 1.05, 95% CI [.003, .028] for the comparison between congruent and incongruent condition; t1(9) = 1.52, p = .49, d = .48, 95% CI [-.005, .024], t2(9) = 1.57, p = .48, d = .55, 95% CI [-.005, .024] for the comparison between incongruent and reversal anomaly condition. Top right, word order: t1(9) = 29.78, p < .0001, d = 9.42, 95% CI [.064, .075]; t2(15) = 6.73, p < .0001, d = 1.68, 95% CI [.048, .092]. Bottom, congruity effect on surprisal as a function of the number of sentences the model has been exposed to: t1(9) =.26, p = 1.0, d = .082, 95% CI [-.015, .019]; t2(9) = .15, p = 1.0, d = .048, 95% CI [-.027, .031] for the comparison between 10 000 and 100 000 sentences; t1(9) = 6.74, p = .0003, d = 2.13, 95% CI [.0009, .0019]; t2(9) = 1.08, p = 1.0, d = .34, 95% CI [-.0015, .0043] for the comparison between 100 000 and 200 000 sentences; t1(9) = 7.45, p = .00015, d = 2.36, 95% CI [.0014, .0026]; t2(9) = 1.78, p = .44, d = .56, 95% CI [- .0005, .0045] for the comparison between 200 000 and 400 000 sentences; t1(9) = 10.73, p < .0001, d = 3.39, 95% CI [.0039, .0060]; t2(9) = 1.93, p = .36, d = .61, 95% CI [-.0008, .011] for the comparison between 400 000 and 800 000 sentences.

Figure 1. The Sentence Gestalt (SG) model architecture, shown processing a sentence with a high or low cloze probability ending, and the model’s N400 correlate. The model (gray boxes on the left) consists of an update network and a query network. Ovals represent layers of units (and number of units in each layer). Arrows represent all-to-all modifiable connections; each unit applies a sigmoid transformation to its summed inputs, where each input is the product of the activation of the sending unit times the weight of that connection. In the update part of the model, each incoming word is processed through layer Hidden 1 where it combines with the previous activation of the SG layer to produce the updated SG pattern corresponding to the updated implicit representation of the event described by the sentence. During training, after each presented word, the model is probed concerning all aspects of the described event (e.g. agent, “man”, action, “play”, patient, “monopoly”, etc.) in the query part of the network. Here, the activation from the probe layer combines via layer Hidden 2 with the current SG pattern to produce output activations. Output units for selected query response units activated in response to the agent, action, and patient probes are shown; each query response includes a distinguishing event feature (e.g. ‘man’, ‘woman’, as shown) as well as other features (e.g., ‘person’, ‘adult’, not shown) that capture semantic similarities among event participants; see Supplementary Table 1). After presentation of “The man”, the SG representation (thought bubble at top left) supports activation of the correct event features when probed for the agent and estimates the probabilities of action and patient features consistent with this agent. After the word “plays” (shown twice in the middle of the figure) the SG representation is updated and the model now activates the correct features given the agent and action probes, and estimates the probability of alternative possible patients. These estimates reflect the model’s experience, since the man plays chess with higher probability than monopoly. If the next word is “chess” (top), the change in the pattern of activation on the SG layer (summed magnitudes of changes shown in ‘Difference vector’) is smaller than if the next word is “monopoly” (bottom). The change signal, called the Semantic Update (SU) is the proposed N400 correlate (right). It is larger for the less probable ending (monopoly, bottom) as compared to the more probable ending (chess, top).

Figure 8. Simulation of the interaction between delayed repetition and semantic incongruity. Cong., congruent; incong., incongruent; rep., repeated. Each red or green dot represents the results for one independent run of the model, averaged across items per condition; the blue dots represent the means for each condition, and blue error bars represent +/- SEM. There were significant main effects of congruity, F1(1,9) = 214.13, p < .0001, ηp 2 = .960, F2(1,9) = 115.66, p < .0001, ηp 2 = .928 and repetition, F1(1,9) = 48.47, p < .0001, ηp 2 = .843, F2(1,9) = 109.78, p < .0001, ηp 2 = .924, and a significant interaction between both factors, F1(1,9) = 83.30, p < .0001, ηp 2 = .902, F2(1,9) = 120.86, p < .0001, ηp 2 = .931; post-hoc comparisons showed that even though the repetition effect was larger for incongruent as compared to congruent sentence completions, it was significant in both conditions, t1(9) = 4.21, p = .0046, d = 1.33, 95% CI [.14, .46], t2(9) = 6.90, p < .0001, d = 2.18, 95% CI [.20, .40], for the congruent completions, and t1(9) = 8.78, p < .0001, d = 2.78, 95% CI [.54, .91], t2(9) = 12.02, p < .0001, d = 3.80, 95% CI [.59, .86] for the incongruent completions.

Citations

PDF

Open Access

More filters

The ERP response to the amount of information conveyed by words in sentences (vol 140, pg 1, 2015)

Stefan L. Frank, +3 more

TL;DR: The authors investigated whether event-related potentials (ERPs) too are predicted by information measures and found that different information measures quantify cognitively different processes and that readers do not make use of a sentence's hierarchical structure for generating expectations about the upcoming word.

...read moreread less

Posted ContentDOI

The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing

Martin Schrimpf, +11 more

- 09 Oct 2020 -

bioRxiv

TL;DR: It is found that the most powerful ‘transformer’ networks predict neural responses at nearly 100% and generalize across different datasets and data types (fMRI, ECoG), suggesting that inherent structure – and not just experience with language – crucially contributes to a model’s match to the brain.

...read moreread less

Journal ArticleDOI

Multimodal Language Processing in Human Communication.

Judith Holler, +3 more

- 01 Aug 2019 -

Trends in Cognitive Sciences

TL;DR: Cognitive mechanisms that may explain the binding of multiple, temporally offset signals under tight time constraints posed by a turn-taking system are proposed and called for a multimodal, situated psycholinguistic framework to unravel the full complexities of human language processing.

...read moreread less

Journal ArticleDOI

Toward a Neurobiologically Plausible Model of Language-Related, Negative Event-Related Potentials.

Ina Bornkessel-Schlesewsky, +1 more

- 21 Feb 2019 -

Frontiers in Psychology

TL;DR: A theoretical framework based on a predictive coding architecture is proposed, within which negative language-related ERP components such as the N400 can be accounted for in a neurobiologically plausible manner and suggests that latency and topography differences between these components reflect the locus of prediction errors and model updating within a hierarchically organized cortical predictive coding Architecture.

...read moreread less

Journal ArticleDOI

Dissociable effects of prediction and integration during language comprehension: evidence from a large-scale study using brain potentials

Mante S. Nieuwland, +28 more

- 03 Feb 2020 -

Philosophical Transactions of the Royal ...

TL;DR: The results challenge the view that the predictability-dependent N400 reflects the effects of either prediction or integration, and suggest that semantic facilitation of predictable words arises from a cascade of processes that activate and integrate word meaning with context into a sentence-level meaning.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Journal ArticleDOI

Finding Structure in Time

Jeffrey L. Elman

- 01 Mar 1990 -

Cognitive Science

TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.

...read moreread less

Journal ArticleDOI

A Neural Substrate of Prediction and Reward

Wolfram Schultz, +2 more

- 14 Mar 1997 -

Science

TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.

...read moreread less

Book

Modularity of mind

Jerry A. Fodor

Collapse

Thirty years and counting: Finding meaning in the N400 component of the event related brain potential (ERP)

Marta Kutas, +1 more

- 01 Jan 2011 -

Annual Review of Psychology

Reading senseless sentences: brain potentials reflect semantic incongruity

Marta Kutas, +1 more

- 11 Jan 1980 -

Science

Brain potentials during reading reflect word expectancy and semantic association

Marta Kutas, +1 more

- 01 Jan 1984 -

Nature

A cortical network for semantics: (de)constructing the N400.

Ellen Lau, +3 more

- 01 Dec 2008 -

Nature Reviews Neuroscience

Expectation-based syntactic comprehension

Roger Levy

- 01 Mar 2008 -

Cognition

Frequently Asked Questions (8)

Q1. What is the only knowledge that is apparent in the model’s performance at the output layer?

The only knowledge that is apparent in the model’s performance at the output layer concerns the possible filler concepts for the agent role and their relative frequency, as well as a beginning tendency to activate the correct agent slightly more than the others.

Q2. What is the purpose of the query-answer form?

The query-answer form is used instead of directly providing the complete event description at the output layer to keep the set of probes and fillers more open-ended and to suggest the broader framework that the task of sentencecomprehension consists in building internal representations that can be used as a basis to respond to probes13.

Q3. What is the conditional probability of the semantic features associated with the critical word?

For type (2), changing position of agent and action, the conditional probability of the semantic features associated with the critical word (again, crucially, not at this position in the sentence but in general within the described event) is 1.0 in the condition with the changed word order and .4 in the condition with the normal word order.

Q4. What is the assumption that the earliest arriving information about a word influences the evolving SG?

The authors assume that in reality, the adjustment of the semantic activation occurs continuously in time as auditory or visual language input is processed, so that the earliest arriving information about a word (whether auditory or visual) immediately influences the evolving SG representation64.

Q5. How many sentences were found to be a valid cue to the agent role in Dutch?

a study found SV word order to be a valid cue to the agent role in 95/100 of sentences in English but only 35/100 sentences in Dutch41.

Q6. What is the expected value of the summed divergence measure?

0. Furthermore, the expected value of the summed divergence measure is 0 if the estimates match the probabilities for all C.Because the real learning environment is rich and probabilistic, the number of possiblesentences that may occur in the environment is indefinite, and it would not in general be possible to represent the estimates of the conditional probabilities explicitly (e.g. by listing them in a table).

Q7. What was the significance level of the two-sided paired t-tests?

The authors used two-sided paired t-tests to analyze differences between conditions; when a simulation experiment involved more than one comparison, significance levels were Bonferroni-corrected within the simulation experiment.

Q8. What is the model’s ability to assign roles correctly when the reversal anomaly?

The authors also examined the model’s capacity to assign roles correctly when the reversalanomaly context (e.g., ‘the fox on the poacher’) was followed by a verb that it had experienced in such contexts during training (e.g. ‘watched’; see Supplementary Fig. 13 for details on the training environment).

Modelling the N400 brain potential as change in a probabilistic representation of meaning.

Figures

Citations

The ERP response to the amount of information conveyed by words in sentences (vol 140, pg 1, 2015)

The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing

Multimodal Language Processing in Human Communication.

Toward a Neurobiologically Plausible Model of Language-Related, Negative Event-Related Potentials.

Dissociable effects of prediction and integration during language comprehension: evidence from a large-scale study using brain potentials

References

Reinforcement Learning: An Introduction

Glove: Global Vectors for Word Representation

Finding Structure in Time

A Neural Substrate of Prediction and Reward

Modularity of mind

Related Papers (5)

Thirty years and counting: Finding meaning in the N400 component of the event related brain potential (ERP)

Reading senseless sentences: brain potentials reflect semantic incongruity

Brain potentials during reading reflect word expectancy and semantic association

A cortical network for semantics: (de)constructing the N400.

Expectation-based syntactic comprehension

Frequently Asked Questions (8)

Q1. What is the only knowledge that is apparent in the model’s performance at the output layer?

Q2. What is the purpose of the query-answer form?

Q3. What is the conditional probability of the semantic features associated with the critical word?

Q4. What is the assumption that the earliest arriving information about a word influences the evolving SG?

Q5. How many sentences were found to be a valid cue to the agent role in Dutch?

Q6. What is the expected value of the summed divergence measure?

Q7. What was the significance level of the two-sided paired t-tests?

Q8. What is the model’s ability to assign roles correctly when the reversal anomaly?