scispace - formally typeset
Open AccessPosted ContentDOI

More than Words: Neurophysiological Correlates of Semantic Dissimilarity Depend on Comprehension of the Speech Narrative

TLDR
Electrophysiological indices based on the semantic dissimilarity of words to their context reflect a listener’s understanding of those words relative to that context, and this work highlights the relative insensitivity of neural measures of low-level speech processing to speech comprehension.
Abstract
Speech comprehension relies on the ability to understand the meaning of words within a coherent context. Recent studies have attempted to obtain electrophysiological indices of this process by modelling how brain activity is affected by a word9s semantic dissimilarity to preceding words. While the resulting indices appear robust and are strongly modulated by attention, it remains possible that, rather than capturing the contextual understanding of words, they may actually reflect word-to-word changes in semantic content without the need for a narrative-level understanding on the part of the listener. To test this possibility, we recorded EEG from subjects who listened to speech presented in either its original, narrative form, or after scrambling the word order by varying amounts. This manipulation affected the ability of subjects to comprehend the narrative content of the speech, but not the ability to recognize the individual words. Neural indices of semantic understanding and low-level acoustic processing were derived for each scrambling condition using the temporal response function (TRF) approach. Signatures of semantic processing were observed for conditions where speech was unscrambled or minimally scrambled and subjects were able to understand the speech. The same markers were absent for higher levels of scrambling when speech comprehension dropped below chance. In contrast, word recognition remained high and neural measures related to envelope tracking did not vary significantly across the different scrambling conditions. This supports the previous claim that electrophysiological indices based on the semantic dissimilarity of words to their context reflect a listener9s understanding of those words relative to that context. It also highlights the relative insensitivity of neural measures of low-level speech processing to speech comprehension.

read more

Content maybe subject to copyright    Report

More than Words: Neurophysiological
Correlates of Semantic Dissimilarity Depend
on Comprehension of the Speech Narrative
Michael P. Broderick
1
, Nathaniel J. Zuk
1
, Andrew J. Anderson
2
and Edmund C.
Lalor
1,2
1
School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity
College Dublin, Dublin 2, Ireland.
2
Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627,
USA.
Correspondence: Michael Broderick: brodermi@tcd.ie; Edmund Lalor: edmund_lalor@urmc.rochester.edu
Abstract
Speech comprehension relies on the ability to understand the meaning of words within a coherent
context. Recent studies have attempted to obtain electrophysiological indices of this process by
modelling how brain activity is affected by a word’s semantic dissimilarity to preceding words. While
the resulting indices appear robust and are strongly modulated by attention, it remains possible that,
rather than capturing the contextual understanding of words, they may actually reflect word-to-word
changes in semantic content without the need for a narrative-level understanding on the part of the
listener. To test this possibility, we recorded EEG from subjects who listened to speech presented in
either its original, narrative form, or after scrambling the word order by varying amounts. This
manipulation affected the ability of subjects to comprehend the narrative content of the speech, but not
the ability to recognize the individual words. Neural indices of semantic understanding and low-level
acoustic processing were derived for each scrambling condition using the temporal response function
(TRF) approach. Signatures of semantic processing were observed for conditions where speech was
unscrambled or minimally scrambled and subjects were able to understand the speech. The same
markers were absent for higher levels of scrambling when speech comprehension dropped below
chance. In contrast, word recognition remained high and neural measures related to envelope tracking
did not vary significantly across the different scrambling conditions. This supports the previous claim
that electrophysiological indices based on the semantic dissimilarity of words to their context reflect a
listener’s understanding of those words relative to that context. It also highlights the relative
insensitivity of neural measures of low-level speech processing to speech comprehension.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.12.14.422789doi: bioRxiv preprint

1. Introduction
Understanding a word’s meaning following the coherent linguistic context in which it appears forms
the basis for natural speech comprehension. Recent studies have attempted to obtain
electrophysiological indices of this process by modelling how brain responses are affected by a word’s
meaning relative to its context (Broderick, Anderson, Di Liberto, Crosse, & Lalor, 2018; Dijkstra,
Desain, & Farquhar, 2020; Frank & Willems, 2017). One particular approach involved regressing
electroencephalographic responses to natural speech against the semantic dissimilarity of words to their
preceding context (Broderick et al., 2018). This produced neurophysiological model measures that
appeared to be exquisitely sensitive to a listener’s speech understanding. This sensitivity was
demonstrated by comparing the neural measures for forward speech and time-reversed speech; by
investigating speech processing under different levels of perceived noise; and in tasks where speech
was either attended or unattended (Broderick et al., 2018). In each of these experiments, speech
understanding coincided with a neural model measure closely resembling the N400 component of the
event related potential, which has long been associated with the processing of meaning (Kutas &
Federmeier, 2011; Kutas & Hillyard, 1980).
We have proposed that this neural model measure known as a semantic temporal response function
(TRF) reflects the semantic processing of words in their context. However, given that the approach is
based on calculating the semantic dissimilarity of words to their preceding context, it remains possible
that the measure reflects a sensitivity to word-to-word changes in semantic content without the need for
a narrative-level understanding on the part of the listener. Furthermore, the human brain processes
speech at multiple linguistic levels, including the analysis of words’ phonological, lexical, and syntactic
properties (Davis & Johnsrude, 2003; de Heer, Huth, Griffiths, Gallant, & Theunissen, 2017; Hickok &
Poeppel, 2007; Poeppel, Emmorey, Hickok, & Pylkkänen, 2012; Price, 2010). As such, it is possible
that the semantic TRF could also reflect processing at some level lower than semantics, where words
are recognised but their true underlying meaning isn’t processed in relation to previous context. Indeed,
as mentioned, the morphology and topographical distribution of the semantic TRF shared common
characteristics with the N400. And the N400 component has been shown not only to be sensitive to
semantic properties of words in context but also to lower-level lexical properties like word frequency
and neighbourhood density (Kutas, 1993). Therefore, we wished to further examine the semantic TRF’s
sensitivity to the processing of intelligible speech where all words were lexically identifiable, but the
overall meaning of the speech narrative was not understood. In other words, we wanted to determine
whether the recently derived semantic TRF was specifically sensitive to the processing of words
following a coherent, predictive context, thus reflecting semantic levels of processing, or whether it
could be elicited by the same speech where word order, and thus context, had been manipulated.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.12.14.422789doi: bioRxiv preprint

To manipulate the narrative coherence of our stimuli while still presenting each word with perfect
intelligibility, we systematically randomised the order of words for a piece of narrative text to create 5
different levels of randomisation, or scrambling. This text was then converted to a speech signal and
presented to subjects while their EEG was recorded. Similar forms of linguistic degradation have been
used in previous studies to investigate syntactic and combinatorial semantic processing (Humphries,
Binder, Medler, & Liebenthal, 2006; Mollica et al., 2018). Inspired by previous studies, where local
temporal information in the acoustic (Kiss, Cristescu, Fink, & Wittmann, 2008; Saberi & Perrott, 1999)
or linguistic (Lerner, Honey, Silbert, & Hasson, 2011) speech stream was manipulated at gradually
increasing temporal windows, we varied the number of consecutive words that could be scrambled to
gradually increase the comprehension difficulty of the speech signal. This created graded levels of
speech understanding with which we could test the sensitivity of the recently derived measures of
semantic processing.
We additionally wished to test the impact of speech comprehension on neural responses relating to the
speech envelope. The amplitude envelope of the speech waveform has been highly emphasised in the
literature (Luo & Poeppel, 2007; Obleser, Herrmann, & Henry, 2012) and is an important cue for speech
perception (Drullman, Festen, & Plomp, 1994a, 1994b; Shannon, Zeng, Kamath, Wygonski, & Ekelid,
1995). This has led researchers to use neural indices of envelope tracking as dependent measures of
speech comprehension (Etard & Reichenbach, 2019; Verschueren, Somers, & Francart, 2019).
However, reliable envelope tracking has also been shown for an array of signals that do not allow
comprehension (Doelling & Poeppel, 2015; Howard & Poeppel, 2010; Lalor, Power, Reilly, & Foxe,
2009; Peña & Melloni, 2012). Fewer studies have compared envelope tracking of speech that is entirely
recognisable to the listener but varies in its degree of semantic comprehensibility. Here we do so by
deriving TRFs to the speech envelope across our different scrambling levels, allowing us to compare
semantic and envelope TRFs for the same speech and the same subjects.
2. Methods
2.1 Subjects
15 subjects (7 female) aged between 19 and 29 participated in the study. All participants were native
English speakers, had self-reported normal hearing, were free of neurological diseases, and provided
written informed consent. All procedures were undertaken in accordance with the Declaration of
Helsinki and were approved by the Ethics Committees of the School of Psychology at Trinity College
Dublin, and the Health Sciences Faculty at Trinity College Dublin.
2.2 Stimuli and Experimental Procedure
Stimuli were acquired from a children’s novel (White, 1951). Text from chapters 3-10 of the novel were
split into 60 segments corresponding to the 60 trials in the experiment. Each segment then underwent a
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.12.14.422789doi: bioRxiv preprint

scrambling procedure to generate 5 different versions. This was done by grouping consecutive words
into windows of length w and randomly permuting the words within each window. The window lengths
were selected as w = 1 (unscrambled), 2, 4, 7, 11 so that the interval between window lengths increased
by 1 with each w (i.e. 2 - 1 = 1, 4 2 =2, 7 4 =3 etc.). Audio versions of each text segment (including
the unscrambled text segments) were then generated using Google’s Text-to-Speech. This software is
powered by WaveNet (Oord et al., 2016), a generative model for raw audio based on a deep neural
network. It generates realistic human-like sounding voices and has been rated significantly more natural
sounding than the best parametric and concatenative systems for English (Oord et al., 2016). Each
speech segment, or trial, lasted ~60 seconds and could be heard as one of the five scrambled versions.
Two sets of questionnaires were generated for each trial. The first was a comprehension questionnaire
that consisted of 6 questions with 2 possible answers, pertaining to the content of the trial. The second
was a lexical identification questionnaire. 6 words were presented with yes/no choices as to whether
the word appeared in the trial. 2-4 of these words were nouns/verbs which had appeared in the trial, and
the remainder were nouns/verbs which were randomly selected from the rest of the text and did not
appear in the trial. No subject reported listening to or reading the novel within at least 3 years of
participating in the study.
For the EEG experiment, trials were presented chronological to the story in 12 blocks of 5 trials. Each
block contained one trial from each of the 5 different scrambling conditions. Condition order was
randomised for each subject and for each block, so that, for example, subject 1 might hear trial 1 with
a scrambling window of 11 whereas subject 2 might hear trial 1 with a scrambling window of 4. After
each trial subjects were presented with a comprehension and lexical identification questionnaire.
Subjects were encouraged to take breaks when needed between trials. Stimuli were presented diotically
at a sampling rate of 44.1 kHz using HD650 headphones (Sennheiser) and Presentation software
(Neurobehavioural Systems). Testing was performed in a dark, sound-attenuated room, and subjects
were instructed to maintain visual fixation on a crosshair centred on the screen for the duration of each
trial, and to minimize eye blinking and all other motor activities.
2.3 Data Acquisition and Preprocessing
128-channel EEG data were acquired at a rate of 512 Hz using an ActiveTwo system (BioSemi).
Offline, the data were downsampled to 128Hz and bandpass filtered between 0.5 and 8Hz using a zero-
phase shift Butterworth 4
th
order filter. To identify channels with excessive noise, the standard deviation
of the time series of each channel was compared with that of the surrounding channels. For each trial,
a channel was identified as noisy if its standard deviation was more than 2.5 times the mean standard
deviation of all other channels or less than the mean standard deviation of all other channels divided by
2.5. Channels contaminated by noise were recalculated by spline interpolating the surrounding clean
channels. Independent component analysis (Hyvarinen, 1999) was then performed on the EEG data in
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.12.14.422789doi: bioRxiv preprint

order to remove eye blinks. EEG data was transformed into component space and components relating
to eye-blinks were identified based on their topographical distribution and component time-series.
These components were removed, and the data were then transformed back to EEG channel space. Data
were then referenced to the average of the 2 mastoid channels and were normalized to zero mean and
unit SD (z units).
2.4 Stimulus Characterisation
We wished to test the effect of scrambling condition on the neural encoding of low-level and high-level
properties of the speech signal. For the low-level representation we chose the speech envelope, an
approach that has been widely used in recent years (e.g., (Kubanek, Brunner, Gunduz, Poeppel, &
Schalk, 2013; Lalor & Foxe, 2010). We calculated it by taking the absolute value of the Hilbert
transform of the broadband (80 to 3000Hz) speech signal, and passing that through a zero phase-shift
low pass filter with a cutoff at 30 Hz.
We represented the context-based meaning of the words in the speech using semantic dissimilarity.
Semantic dissimilarity quantifies the semantic relationship between words and their previous context.
GloVe a word embedding model (GloVe; (Pennington, Socher, & Manning, 2014)) - was used to
represent each content word in the stimulus as a vector or coordinate in high dimensional space. Vectors
were derived by factorizing the word co-occurrence matrix of a large text corpus - in this case Common
Crawl (https://commoncrawl.org/). The output is a 300-dimensional vector for each word, where each
dimension can be thought to reflect some latent linguistic context. A word’s semantic dissimilarity was
estimated as 1 minus the Pearson’s correlation of the word’s vector and the average vector of words
from the preceding context. Previous studies have chosen this preceding context to be all previous words
in the same sentence (Broderick et al., 2018; Broderick, Di Liberto, Anderson, Rofes, & Lalor, 2020).
However, given that, for scrambled speech, sentence boundaries were randomised, we chose to instead
estimate semantic dissimilarity by comparing a word vector with the averaged vector of its 10 preceding
words. This window was chosen somewhat arbitrarily, however, choosing different context window
lengths of 5, 8 and 12 did not qualitatively alter the overall results. The semantic dissimilarity measure
was quantified as a vector of impulses, the same length as the presented trial, with impulses at the onset
of each content word whose heights was scaled according to their semantic dissimilarity value. Finally,
we included an additional feature of word onset as input to the TRF. This is an impulse vector, with
impulses at the beginning of each content word, whose height is a constant value of the average of the
semantic dissimilarity values in the same trial. The purpose of including this feature was to try to soak
up additional variance in the EEG response related to the acoustic processing of the word onset.
Speech waveforms for each trial and each scrambling condition (60 trials x 5 conditions) were generated
first (using WaveNet) before the speech envelope or semantic dissimilarity features were estimated.
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 14, 2020. ; https://doi.org/10.1101/2020.12.14.422789doi: bioRxiv preprint

Figures
Citations
More filters
Journal ArticleDOI

Effects of Age on Cortical Tracking of Word-Level Features of Continuous Competing Speech.

TL;DR: The authors investigated effects of age on cortical tracking of these word-level features within a two-talker speech mixture, and their relationship with self-reported difficulties with speech-in-noise understanding.
Journal ArticleDOI

Deep language algorithms predict semantic comprehension from brain activity

TL;DR: The authors showed that the representations of GPT-2 not only map onto the brain responses to spoken stories, but also predict the extent to which subjects understand the corresponding narratives, and fit a linear mapping model to predict brain activity from GPT2 activations.
Posted ContentDOI

The effects of speech masking on neural tracking of acoustic and semantic features of natural speech

TL;DR: In this paper , the authors investigate how neural activity in response to acoustic and semantic features changes with acoustic challenges, and how such effects relate to speech intelligibility during naturalistic speech listening.
Posted ContentDOI

Electrophysiological indices of hierarchical speech processing differentially reflect the comprehension of speech in noise

TL;DR: In this paper , EEG data from neurotypical adults listening to segments of an audiobook in different levels of background noise was used to study the relationship between brain and behavior by comprehensively linking hierarchical indices of neural speech processing to language comprehension metrics.
Posted ContentDOI

Heard or understood? Neural tracking of language features in a comprehensible story, an incomprehensible story and a word list

TL;DR: In this article , the authors evaluated whether speech comprehension can be investigated by neural tracking, i.e., the phenomenon in which the brain responses time-lock to the rhythm of specific features in continuous speech.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Journal ArticleDOI

Fast and robust fixed-point algorithms for independent component analysis

TL;DR: Using maximum entropy approximations of differential entropy, a family of new contrast (objective) functions for ICA enable both the estimation of the whole decomposition by minimizing mutual information, and estimation of individual independent components as projection pursuit directions.
Journal ArticleDOI

The cortical organization of speech processing

TL;DR: A dual-stream model of speech processing is outlined that assumes that the ventral stream is largely bilaterally organized — although there are important computational differences between the left- and right-hemisphere systems — and that the dorsal stream is strongly left- Hemisphere dominant.
Journal ArticleDOI

Reading senseless sentences: brain potentials reflect semantic incongruity

TL;DR: In a sentence reading task, words that occurred out of context were associated with specific types of event-related brain potentials that elicited a late negative wave (N400).
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions in "More than words: neurophysiological correlates of semantic dissimilarity depend on comprehension of the speech narrative" ?

To test this possibility, the authors recorded EEG from subjects who listened to speech presented in either its original, narrative form, or after scrambling the word order by varying amounts. This supports the previous claim that electrophysiological indices based on the semantic dissimilarity of words to their context reflect a listener ’ s understanding of those words relative to that context. It is made available under a preprint ( which was not certified by peer review ) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted December 14, 2020.