scispace - formally typeset
Open AccessJournal ArticleDOI

Effects of Attention on Neuroelectric Correlates of Auditory Stream Segregation

Reads0
Chats0
TLDR
Evidence is provided for two cortical mechanisms of streaming: automatic segregation of sounds and attention-dependent buildup process that integrates successive tones within streams over several seconds.
Abstract
A general assumption underlying auditory scene analysis is that the initial grouping of acoustic elements is independent of attention. The effects of attention on auditory stream segregation were investigated by recording event-related potentials (ERPs) while participants either attended to sound stimuli and indicated whether they heard one or two streams or watched a muted movie. The stimuli were pure-tone ABA– patterns that repeated for 10.8 sec with a stimulus onset asynchrony between A and B tones of 100 msec in which the A tone was fixed at 500 Hz, the B tone could be 500, 625, 750, or 1000 Hz, and – was a silence. In both listening conditions, an enhancement of the auditory-evoked response (P1–N1–P2 and N1c) to the B tone varied with Δf and correlated with perception of streaming. The ERP from 150 to 250 msec after the beginning of the repeating ABA– patterns became more positive during the course of the trial and was diminished when participants ignored the tones, consistent with behavioral studies indicating that streaming takes several seconds to build up. The N1c enhancement and the buildup over time were larger at right than left temporal electrodes, suggesting a right-hemisphere dominance for stream segregation. Sources in Heschl's gyrus accounted for the ERP modulations related to Δf-based segregation and buildup. These findings provide evidence for two cortical mechanisms of streaming: automatic segregation of sounds and attention-dependent buildup process that integrates successive tones within streams over several seconds.

read more

Content maybe subject to copyright    Report

Effects of Attention on Neuroelectric Correlates
of Auditory Stream Segregation
Joel S. Snyder
1
, Claude Alain
1,2
, and Terence W. Picton
1,2
Abstract
& A general assumption underlying auditory scene analysis is
that the initial grouping of acoustic elements is independent of
attention. The effects of attention on auditory stream segrega-
tion were investigated by recording event-related potentials
(ERPs) while participants either attended to sound stimuli and
indicated whether they heard one or two streams or watched
a muted movie. The stimuli were pure-tone ABA patterns
that repeated for 10.8 sec with a stimulus onset asynchrony
between A and B tones of 100 msec in which the A tone was
fixed at 500 Hz, the B tone could be 500, 625, 750, or 1000 Hz,
and was a silence. In both listening conditions, an enhance-
ment of the auditory-evoked response (P1–N1–P2 and N1c) to
the B tone varied with f and correlated with perception of
streaming. The ERP from 150 to 250 msec after the beginning
of the repeating ABA patterns became more positive during
the course of the trial and was diminished when participants
ignored the tones, consistent with behavioral studies indicat-
ing that streaming takes several seconds to build up. The N1c
enhancement and the buildup over time were larger at right
than left temporal electrodes, suggesting a right-hemisphere
dominance for stream segregation. Sources in Heschl’s gyrus
accounted for the ERP modulations related to f-based
segregation and buildup. These findings provide evidence for
two cortical mechanisms of streaming: automatic segregation of
sounds and attention-dependent buildup process that inte-
grates successive tones within streams over several seconds. &
INTRODUCTION
Making sense of the acoustic environment requires
parsing sounds that originate from different physical ob-
jects and grouping together sounds that emanate from
the same object. These processes play a critical role in
a listener’s ability to identify and recognize complex
acoustic signals such as speech and music. The collec-
tion of internal processes that segregate and group
sounds to form representations of auditory objects is
called auditory scene analysis (Bregman, 1990). Without
auditory scene analysis, forming accurate representa-
tions of the external world would fail, especially in com-
plex situations wherein multiple objects produce similar
sounds. For example, a listener at a cocktail party must
process speech from one person while other speakers
are talking (Cherry, 1953). A similar situation arises when
a listener focuses on a single musical instrument in an
ensemble.
One popular paradigm for studying auditory scene
analysis presents low tones (A), high tones (B), and
silences () in a repeating ABA pattern (see Figure 1).
When the difference in frequency between the A and B
tones is small and the repetition rate of the sequence is
slow, listeners hear a single stream of tones in a galloping
rhythm. When the frequency difference is large and the
repetition rate is fast, listeners report the sequence split-
ting into two streams of tones, each in a metronome-like
rhythm.
According to the ‘‘peripheral channeling hypothesis,’’
the most powerful cues for stream segregation are those
that lead to two or more nonoverlapping activations in
the cochlea (Hartmann & Johnson, 1991) such as pure-
tone frequency and ear of stimulation. This type of place-
based segregation is likely to be carried up through the
ascending auditory pathway to the tonotopic fields of
the auditory cortex (Kaas & Hackett, 2000). A recent
study (Fishman, Arezzo, & Steinschneider, 2004) sup-
ported this idea by presenting A and B tones that alter-
nated in frequency while recording multiunit activity
from macaque monkey primary auditory cortical neurons
that were maximally responsive to the A tones. As the
presentation rate and pitch separation of A and B tones
increased, the firing rate increased in response to the A
tones. Another study reported similar findings from the
primary auditory cortex of bats (Kanwal, Medvedev, &
Micheyl, 2003). Bee and Klump (2004) measured multi-
unit activity in the auditory forebrain (homologous to
the mammalian auditory cortex) of starlings that showed
correspondence with behavioral data from starlings and
humans.
Despite the power of the peripheral channeling hy-
pothesis, a number of cues besides those based on
1
Baycrest Centre for Geriatric Care, Toronto,
2
University of
Toronto
D 2006 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 18:1, pp. 1–13

peripheral segregation can lead to streaming (for a re-
view, see Moore & Gockel, 2002), implying that centrally
computed features contribute to stream segregation.
Further supporting the existence of central mechanisms
for streaming is the finding that perception of streaming
does not occur immediately but takes several seconds to
build up (Anstis & Saida, 1985; Bregman, 1978). Addi-
tionally, the effect of a biasing sequence that increases
perception of streaming also lasts for several seconds
(Beauvois & Meddis, 1997), with a longer time constant
for musicians than nonmusicians (Beauvois & Meddis,
1997). Despite this slow buildup and decay for stream-
ing, transient events such as a brief silence in the ABA
pattern or an attention shift can almost completely reset
the buildup process (Cusack, Deeks, Aikman, & Carlyon,
2004). The long time constants for buildup and decay
of streaming and the influence of musical experience
further suggest that critical aspects of streaming occur at
higher levels of the auditory system. In particular, the
long time constants would be consistent with neuro-
magnetic correlates of echoic memory in the auditory
cortex (Lu¨, Williamson, & Kaufman, 1992), and compu-
tational modeling of streaming that use inhibitory time
constants typical of the auditory cortex (Kanwal et al.,
2003; McCabe & Denham, 1997). Thus, although it is
clear that low-level aspects of stream segregation oper-
ate at low-level stages of the auditory system, other as-
pects of streaming likely require computations in the
auditory cortex and other cortical structures.
Evidence of attentional effects on the buildup of
streaming has further implicated higher-level influences
on stream segregation (Carlyon, Cusack, Foxton, &
Robertson, 2001). When participants ignore the ABA
pattern presented to one ear by listening to sounds
presented to the other ear and then switch their atten-
tion to the ABA pattern, the buildup process of stream-
ing is diminished compared to when participants simply
attend to the ABA patterns for the whole trial. The ap-
parent diminishment of streaming when ignoring the
ABA pattern, however, might be in part due to the pro-
cess of switching attention rather than an actual effect of
ignoring the sounds (Cusack et al., 2004). Further casting
doubt on the influence of attention, ignored ABA
patterns that would be perceived as streaming result in a
reduction in interference in a visual memory task, com-
pared to patterns that would not be perceived as stream-
ing (Macken, Tremblay, Houghton, Nicholls, & Jones,
2003). These behavioral studies thus lead to an uncertain
conclusion about whether, to what extent, and at what
level of processing listeners’ attention affects streaming.
Event-related potentials (ERPs) might help provide a
clearer answer to these issues by isolating neural events
that correspond to distinct aspects of streaming. Fur-
thermore, it is possible to record ERPs when participants
are actively listening as well as when they are ignoring
sounds, providing a simple means for evaluating the ef-
fects of attention on streaming.
Such an approach has been used in previous studies
to understand the perception of concurrently presented
auditory objects rather than simultaneously unfolding
auditory streams. As with stream segregation, percep-
tion of multiple concurrent sounds is promoted by in-
creased pitch separation. For example, if one shifts the
frequency of a partial in a multicomponent stimulus, all
Figure 1. Five cycles of
stimuli used during attend and
ignore conditions. Actual trials
were composed of 27 cycles.
Each bar represents a pure
tone in a galloping rhythm.
2 Journal of Cognitive Neuroscience Volume 18, Number 1

the other frequencies of which derive from a single fun-
damental, this shifted component is heard as a separate
tone (Moore, Glasberg, & Peters, 1986). ERP research on
this perceptual phenomenon has identified a negative
peak at 150 msec following presentation of the com-
plex sound called the ‘‘object-related negativity’’ (ORN).
The ORN amplitude varies in direct proportion with
perception of two simultaneous auditory objects (Alain,
Arnott, & Picton, 2001) and is not affected by selective
attention (Alain & Izenberg, 2003).
Earlier ERP research on sequential stream segregation
used randomly presented tones during selective atten-
tion (Alain & Woods, 1994; Alain, Achim, & Richer,
1993). For example, Alain and Woods (1994) presented
concurrent interleaved tone sequences of different fre-
quencies and showed behavioral and ERP evidence
that it was easier to selectively process particular
pitches when the other task-irrelevant tones were
grouped together. This suggested that perceptual
grouping overrides the effects of physical similarity
during selective attention and that auditory attention,
like visual attention, may be allocated to objects (Alain
& Arnott, 2000).
Studies by Winkler et al. (2003) and Sussman, Ritter,
and Vaughan (1999) used the mismatch negativity
(MMN) to study streaming. The MMN is a negative
ERP component that peaks 150 msec following a
deviant stimulus in a sequence of homogeneous audi-
tory events (for a review, see Picton, Alain, Otten, Ritter,
& Achim, 2000). Studying streaming with the MMN
requires events that can only be perceived as deviants
if streaming has occurred. For example, Winkler et al.
presented simultaneous sequences, one in which the
intensity was constant except for occasional deviants and
one in which the intensity varied constantly. When the
two sequences overlapped in frequency, the constant
intensity variations in one sequence obscured the occa-
sional intensity deviants in the other sequence and no
MMN to the occasional deviants occurred. However,
when the two sequences were widely separated in fre-
quency, an MMN occurred to the occasional deviants,
suggesting that streaming had occurred prior to de-
tection of intensity deviants. Although the MMN indi-
cates that stream segregation has occurred, it reveals
little about the neural mechanisms underlying stream-
ing because it does not track ongoing processing of
tone patterns.
The current study uses a more direct paradigm mea-
suring brain activity that tracks the ABA pattern as
streaming occurs, in hopes of addressing some of these
limitations. Using 10.8-sec trials helped to determine
whether buildup of neural activity mirrors behavioral
buildup of streaming (Anstis & Saida, 1985; Bregman,
1978). To avoid motor contamination of the neural mea-
surements, participants were asked to indicate at the
end of the sequence whether they heard one or two
streams, enabling us to establish a relationship between
ongoing patterns of neural activity and whether stream-
ing had occurred. To identify segregation processes, we
compared activity for trials with different frequency
separations (f ) between A and B tones. To identify
buildup processes, on the other hand, we compared
activity at different 2-sec time bins within the 10.8-sec
trials. In one session, we collected ERP data while partic-
ipants made streaming judgments after the end of each
trial. In a separate session, we presented identical sound
patterns and asked the same participants to watch a
muted subtitled video of their choice and to ignore the
auditory stimuli. This manipulation was designed to
test whether stream segregation as indexed by ERPs
depends on focused attention, thereby allowing us to
clarify the stage at which attention affects streaming. The
use of muted subtitled movies is important because
the text dialogue effectively captures attention while
not interfering with auditory processing (Pettigrew
et al., 2004).
Based on place models of streaming (McCabe &
Denham, 1997; Beauvois & Meddis, 1996; Hartmann &
Johnson, 1991), we expected increases in activity as a
function of f corresponding to decreased overlap
between activations corresponding to the A and B tones.
We hypothesized that segregation processes would func-
tion independent of attention, whereas neural buildup
processes would be affected by attention as suggested
by behavioral studies (Cusack et al., 2004; Carlyon
et al., 2001).
RESULTS
Behavioral Data
Figure 2 shows the mean proportion of trials heard as
streaming for each of the f conditions. As expected,
the likelihood of reporting perception of two streams
increased with f. At 0 semitone (1 semitone = 1/12
octave), participants rarely reported hearing streaming,
whereas at 12 semitones, participants almost always
heard streaming by the end of the 10.8-sec trial. At
Figure 2. Group mean SE) proportion of trials heard as streaming
across participants (n = 10) for the four f levels.
Snyder, Alain, and Picton 3

intermediate levels (4 and 7 semitones), participants
sometimes heard streaming and sometimes heard the
galloping pattern for the whole trial. There was a
significant main effect of f, F(3,27) = 88.02, p < .001,
with all adjacent levels of f differing from each other
( p < .05).
Neural Activity Reflecting
Frequency-based Segregation
Figure 3A shows ERPs elicited by the onset of the se-
quence for attend and ignore conditions, collapsed
across f. The ERPs comprised P1 (60 msec), N1
(120 msec), and P2 (160 msec) waves that were
maximal at frontocentral scalp regions (see Figure 3B).
There was also a clear negative peak at 200 msec
referred to as the N2 wave that was present only when
the A and B tones differed in frequency. Following the
transient responses, there was a sustained potential (SP)
that was negative and maximal over the frontal regions.
The N1 and SP showed larger amplitude during active
than passive listening, F(1,9) = 43.59 and 70.82, respec-
tively, p < .001. The effect of attention was not sig-
nificant for the P1, P2, and N2 waves. As shown in
Figure 3C, the effect of f on the P1, N1, and P2 waves
was not significant. However, the N2 showed a signifi-
cant amplitude increase as a function of f, F(3,27) =
12.29, p < .001, with a marginal effect for SP, F(3,27) =
2.95, p = .054. There was no significant interaction be-
tween attention and f for any ERP deflections elicited
by the onset of the sequence.
A close examination of the SP revealed periodic fluc-
tuations in amplitude that corresponded closely with
rate of stimulus presentation. These smaller fluctuations
were more easily assessed in smaller epochs. Figure 4A
shows 2-sec ERPs in the attend condition with all four
levels of f superimposed at FCz. In Figure 4B are single
cycles of the ERPs at FCz and the left and right temporal
sites (T7 and T8). The neural activity associated with
increasing f is best illustrated by subtracting ERPs
elicited by stimuli with constant frequency (i.e., 0 semi-
tone condition) from those obtained when the A and
B tones differed in frequency.
This subtraction procedure isolated a series of time-
locked ERP waves elicited by the B tone, which included
P1 (60 msec), N1 (115), and P2 (175 msec) de-
flections at frontocentral scalp regions and an N1c
(160 msec) that was maximal over the right temporal
electrode (i.e., T8). The results of this subtraction are
shown in Figure 5A for attend and ignore conditions.
Figure 3. Group mean ERPs
to the beginning of the trial.
(A) ERP response to the first
2 sec of each trial for the
attend and ignore conditions
collapsed across f at the
midline frontocentral (FCz)
electrode. Horizontal bars
above the time scale represent
pure tones in the stimulus
pattern. (B) Scalp distribution
of voltage for the P1, N1,
P2, N2, and SP waves at 72,
124, 168, 220, and 620 msec,
respectively in the attend
condition collapsed across f.
Darker regions indicate more
activity, with polarity labeled
by + and signs. Isocontour
lines represent 0.4 AV/step
for P1, N1, P2, and N2 and
0.8 AV/step for SP. (C) Same
as (A) for the four f levels
in the attend condition.
4 Journal of Cognitive Neuroscience Volume 18, Number 1

The effects of f and attention were examined on the
P1, N1, and P2 peak latencies at the nine frontocentral
electrodes (Fz/1/2, FCz/1/2, Cz/1/2). We also quantified
P1–N1 and N1–P2 peak-to-peak amplitudes, allowing us
to examine transient changes in neural activity while
controlling for other changes in sustained activity that
may overlap with the P1, N1, and P2 waves. The means
across the nine frontocentral electrodes were entered
into analyses of variance (ANOVAs) to test for effects
of attention, f, and time. We quantified the peak la-
tency and amplitude of the N1c at the left and right
temporal electrodes (T7 and T8). Quantifying the N1c,
which arises from current sources in the auditory cor-
tex with a radial orientation (Picton, Alain, Woods, et al.,
1999), allowed us to test for hemispheric differences
in f-based segregation processing. We also examined
whether the latency and amplitude of these responses
varied as a function of time by dividing the 10.8-sec se-
quences into five 2-sec periods and averaging the re-
sponses within that period. This allowed us to examine
whether the neural activity associated with the process-
ing of f varied as a function of time.
The P1, N1, and P2 latencies decreased, F(2,18) =
11.09, 17.21, and 13.44, p < .005, and the P1–N1 and
N1–P2 amplitudes increased, F(2,18) = 6.87 and 18.75,
p < .025, as a function of f. All linear trends of f on
peak latencies and peak-to-peak amplitudes were signif-
icant ( p < .025). There were significantly longer laten-
cies for the N1 and P2, F(1,9) = 5.16 and 13.24, p < .05,
and a larger P1–N1 amplitude, F(1,9) = 6.08, p < .05,
when participants attended the stimuli. A significant
Attention f interaction occurred only for P2 latency,
Figure 4. ERP time course
as f increased in the attend
condition. (A) ERP response at
FCz to 5 cycles (2000 msec) of
the ABA pattern with a box
around a single repetition of
the ABA pattern for the four
f levels. (B) Single-cycle ERPs
at T7 (left temporal), FCz
(frontocentral midline), and T8
(right temporal) showing the
effect of varying f. Horizontal
bars above the time scale
represent pure tones in the
stimulus pattern.
Figure 5. Difference waves
between ERPs elicited by 0
semitone f and those elicited
by 4, 7, and 12 semitone f for
attend and ignore conditions.
(A) Difference waves to the
0.4-sec ABA pattern averaged
at T7, FCz, and T8 for attend
and ignore conditions.
Horizontal bars above the
time scale represent pure
tones in the stimulus pattern.
(B) Normalized average
amplitude in the P2 time
region (244–300) across nine
frontocentral channels (top)
and the N1c time region
(232–300) at T8 (bottom)
in the attend and ignore
conditions plotted against
f along with the behavioral
data from Figure 2.
Snyder, Alain, and Picton 5

Figures
Citations
More filters
Journal ArticleDOI

Modeling the auditory scene: predictive regularity representations and perceptual objects

TL;DR: An account of auditory perception suggesting that representations of predictable patterns, or 'regularities', extracted from the incoming sounds serve as auditory perceptual objects that generate hypotheses about the causal structure of the world.
Journal ArticleDOI

Mismatch negativity (MMN), the deviance-elicited auditory deflection, explained

TL;DR: It is proposed that the MMN is, in essence, a latency- and amplitude-modulated expression of the auditory N1 response, generated by fresh-afferent activity of cortical neurons that are under nonuniform levels of adaptation.
Journal ArticleDOI

Auditory attention : focusing the searchlight on sound

TL;DR: Current research seeks to unravel the complex interactions of pre-attentive and attentive processing of the acoustic scene, the role of auditory attention in mediating receptive-field plasticity in both auditory spatial and auditory feature processing, the contrasts and parallels between auditory and visual attention pathways and mechanisms.
Journal ArticleDOI

A review of visual memory capacity: Beyond individual items and toward structured representations.

TL;DR: The main thesis of this review will be that one cannot fully understand memory systems or memory processes without also determining the nature of memory representations, and how this impacts not only how the capacity of the system is estimated but how memory systems and memory processes are modeled.
Journal ArticleDOI

The what, where and how of auditory-object perception

TL;DR: The fundamental perceptual unit in hearing is the 'auditory object', which is the computational result of the auditory system's capacity to detect, extract, segregate and group spectrotemporal regularities in the acoustic environment.
References
More filters
Journal ArticleDOI

Some Experiments on the Recognition of Speech, with One and with Two Ears

TL;DR: In this paper, the relation between the messages received by the two ears was investigated, and two types of test were reported: (a) the behavior of a listener when presented with two speech signals simultaneously (statistical filtering problem) and (b) behavior when different speech signals are presented to his two ears.
Book

Auditory Scene Analysis: The Perceptual Organization of Sound

TL;DR: Auditory Scene Analysis as discussed by the authors addresses the problem of hearing complex auditory environments, using a series of creative analogies to describe the process required of the human auditory system as it analyzes mixtures of sounds to recover descriptions of individual sounds.
Journal ArticleDOI

Subdivisions of auditory cortex and processing streams in primates

TL;DR: The challenge for future researchers is to understand how this complex system in monkeys analyzes and utilizes auditory information.
Journal ArticleDOI

A multiple source approach to the correction of eye artifacts.

TL;DR: A new multiple source eye correction (MSEC) method of eye artifact treatment based on multiple source analysis is presented, which incorporates a model of brain activity to enhance the precision of topographical EEG analyses.
Journal ArticleDOI

Endogeneous brain potentials associated with selective auditory attention

TL;DR: It was concluded that the effect of selective auditory attention on the N1 component is not due solely to an enlargement of the exogenous N1 components of the vertex potential, but rather includes the addition of a prolonged endogenous component.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What are the contributions mentioned in the paper "Effects of attention on neuroelectric correlates of auditory stream segregation" ?

In this paper, the effects of attention on auditory stream segregation were investigated by recording event-related potentials ( ERPs ) while participants either attended to sound stimuli and indicated whether they heard one or two streams or watched a muted movie. 

An additional modulation reflected an increase in neural activity as a function of time while listening to the extended ABA patterns that showed a similar time course as the buildup of streaming reported in behavioral studies ( Anstis & Saida, 1985 ; Bregman, 1978 ). 

The collection of internal processes that segregate and group sounds to form representations of auditory objects is called auditory scene analysis (Bregman, 1990). 

According to the peripheral channeling hypothesis, segregation of tone patterns depends on activation along a tonotopic representations in the cochlea and other subcortical auditory structures (Hartmann & Johnson, 1991). 

The relatively high residual variances might reflect in part the activation of multiple sources over a relatively wide area of the superior and lateral temporal surfaces. 

When participants ignore the ABA pattern presented to one ear by listening to sounds presented to the other ear and then switch their attention to the ABA pattern, the buildup process of streaming is diminished compared to when participants simply attend to the ABA patterns for the whole trial. 

In the present study, increases in ERP amplitude with larger f could have arisen from the segregation of distinct activations corresponding to the A and B tonesin tonotopically organized structures. 

One popular paradigm for studying auditory scene analysis presents low tones (A), high tones (B), and silences ( ) in a repeating ABA pattern (see Figure 1). 

The apparent diminishment of streaming when ignoring the ABA pattern, however, might be in part due to the process of switching attention rather than an actual effect of ignoring the sounds (Cusack et al., 2004). 

The birds were first trained to peck one key when listening to a constant frequency ABA tone pattern in a galloping rhythm (similar to the 0 semitone f condition in the current study), and to press a different key when listening to a single stream of tones either at the tempo of the A tones (i.e., A A . . .) or at the tempo of the B tones (i.e., B . . .). 

Making sense of the acoustic environment requires parsing sounds that originate from different physical objects and grouping together sounds that emanate from the same object. 

The neural activity associated with increasing f is best illustrated by subtracting ERPs elicited by stimuli with constant frequency (i.e., 0 semitone condition) from those obtained when the A and B tones differed in frequency. 

In particular, the long time constants would be consistent with neuromagnetic correlates of echoic memory in the auditory cortex (Lü, Williamson, & Kaufman, 1992), and computational modeling of streaming that use inhibitory time constants typical of the auditory cortex (Kanwal et al., 2003; McCabe & Denham, 1997). 

Despite this slow buildup and decay for streaming, transient events such as a brief silence in the ABA pattern or an attention shift can almost completely reset the buildup process (Cusack, Deeks, Aikman, & Carlyon, 2004). 

The long time constants for buildup and decay of streaming and the influence of musical experience further suggest that critical aspects of streaming occur at higher levels of the auditory system. 

The ORN amplitude varies in direct proportion with perception of two simultaneous auditory objects (Alain, Arnott, & Picton, 2001) and is not affected by selective attention (Alain & Izenberg, 2003).