scispace - formally typeset
Search or ask a question

Showing papers by "Kevin G. Munhall published in 2012"


01 Jan 2012
TL;DR: Surprisingly, toddlers' speech didn't change in response to altered feedback, suggesting that long-held assumptions regarding the role of self-perception in articulatory development need to be reconsidered.
Abstract: Department of Psychology, University of TorontoMississauga, 3359 Mississauga Road North, Mississauga,Ontario L5L1C6, CanadaSummarySpecies-specific vocalizations fall into two broad cate-gories: those that emerge during maturation, independentof experience, and those that depend on early life interac-tions with conspecifics. Human language and the communi-cationsystemsofasmallnumberofotherspecies,includingsongbirds, fall into this latter class of vocal learning. Self-monitoring has been assumed to play an important role inthe vocal learning of speech [1–3] and studies demonstratethat perception of your own voice is crucial for both thedevelopment and lifelong maintenance of vocalizations inhumans and songbirds [4–8]. Experimental modificationsof auditory feedback can also change vocalizations in bothhumans and songbirds [9–13]. However, with the exceptionof large manipulations of timing [14, 15], no study to datehas ever directly examined the use of auditory feedback inspeech production under the age of 4. Here we use a real-timeformantperturbation task[16]to comparetheresponseof toddlers, children, and adults to altered feedback. Chil-dren and adults reacted to this manipulation by changingtheir vowels in a direction opposite to the perturbation.Surprisingly, toddlers’ speech didn’t change in responseto altered feedback, suggesting that long-held assumptionsregarding the role of self-perception in articulatory develop-ment need to be reconsidered.ResultsIn humans, there is a clearly defined linkage between vocaltract configuration and the acoustic structure of speech. Thetwo vocal tract configurations shown in Figure 1A havedifferent resonant frequencies leading to the amplification ofdifferent harmonics in the speech signal. Speech researcherscall these amplified harmonics ‘‘formants,’’ and listeners relyheavily on formants to determine what consonant or vowel aspeaker intended to produce. As speakers shift the configura-tion of their vocal tract, the formant structure of their utter-ances shifts accordingly. By attending to the linkage betweentheir own unique vocal tract configurations and the resultingspeechacoustics,youngchildrencouldfine-tunethemappingbetween motor commands sent from their brains to thevocal-production organs and the resulting acoustic outputproduced.In the current study, we look at real-time compensatorybehavior in vowel production when auditory feedback ismodified. We use a rapid signal processing system to changethe formant frequencies of vowels produced by children andadults. Previous work with adults has demonstrated thatwhen talkers receive auditory feedback in which their ownvowel formants are shifted to new locations in the vowelspace, they rapidly compensate for the perturbations, alteringthe formant frequencies of the vowels they produce in a direc-tion opposite to the perturbation [16–19]. This responsepattern has been interpreted as evidence for the existence ofa predictive mechanism in speech motor control [17]. Thisphenomenon also demonstrates that even adult speakersremain reliant on auditory feedback to fine-tune the accuracyof their vocal productions.We tested three different age groups of native Englishspeakers: adults (26 adult females with a mean age of 18.9years), young children (26 children with a mean age of51.5 months), and toddlers (20 children with a mean age of29.8 months). Each talker produced 50 utterances of theword‘‘bed.’’Toelicittheseutterancesfromtheyoungchildrenandtoddlers,wedevelopedavideogameinwhichthechildrenwould help a robot cross a virtual playground by saying therobot’s ‘‘magic’’ word ‘‘bed’’ (Figure 1B). During the first 20utterances, talkers received normal acoustic feedbackthrough a pair of headphones. During the last 30 utterances,talkers received feedback in which the frequency of their firstandsecondformants (F1andF2,respectively)wereperturbedusingareal-timeformantshiftingsystem.F1wasincreasedby200 Hz and F2 was decreased by 250 Hz. This manipulationchanged talkers’ productions of the word ‘‘bed’’ into theirown voice saying the word ‘‘bad.’’For each utterance, the ‘‘steady-state’’ F1 and F2 frequencywas determined by averaging estimates of that formant from40% to 80% of the way through the vowel. These resultswere then normalized for each individual by subtracting thataverage of that individual’s baseline utterances defined asthe average of the last 15 utterances before feedback wasaltered (i.e., utterances 6–20). For statistical analyses, indi-vidualmeasuresofcompensationinF1andF2werecomputedwith the magnitude based on the difference in averagefrequency between the last 20 utterances (i.e., utterances31–50) and the baseline used in normalization. The sign wasdetermined based on whether the change in productionopposed (positive) or followed (negative) the direction of theperturbation.The normalized results, averaged across individuals ineach group, are plotted in Figure 2. As in previous formantperturbation experiments [16, 19], the adults spontaneouslycompensatedbyalteringthefrequencyofF1andF2inadirec-tionoppositetothatoftheperturbation(toppanel).Theyoungchildren also compensated in a manner similar to the adults(middle panel). However, the toddlers did not alter productionof F1 or F2 in response to the perturbation (bottom panel).To verify these observations, we computed individual

61 citations


Journal ArticleDOI
TL;DR: This paper used a real-time formant perturbation task to compare the response of toddlers, children, and adults to altered feedback and found that children and adults reacted to this manipulation by changing their vowels in a direction opposite to the perturbations.

60 citations


Journal ArticleDOI
TL;DR: Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets.
Abstract: Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant’s gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.

22 citations


Journal ArticleDOI
TL;DR: New evidence shows that infants' gaze fixations to the mouth and eye region shift predictably with changes in age and language familiarity.

9 citations


01 Jan 2012
TL;DR: While no significant correlation was found between the two perturbations conditions, a modest correlation between compensations in pitch and formant frequency was observed within the pitch perturbation condition.
Abstract: Previous studies have demonstrated a wide range in individuals’ compensations in response to real-time alterations of the auditory feedback of both pitch and formant frequencies. One potential source of this variability may be individual differences in the relative weighting of auditory and somatosensory feedback. The present study examined this variability by comparing individuals’ compensations during two perturbation conditions: a pitch shift (+200 cents) and a formant shift (F1 +200 Hz, F2 250 Hz). While no significant correlation was found between the two perturbation conditions, a modest correlation between compensations in pitch and formant frequency was observed within the pitch perturbation condition.

4 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the effects of presenting the audiovisual stimuli live, like Sumby and Pollack, and found that the sight of a talker's face enhanced the identification of auditory speech in noise.
Abstract: In their classic paper, Sumby and Pollack (1954) demonstrated that the sight of a talker’s face enhanced the identification of auditory speech in noise. Recently, there has been interest in the influence of some of their methodologies on audiovisual speech perception. Here, we examine the effects of presenting the audiovisual stimuli live, like Sumby and Pollack. Live presentation yields 3D visual stimuli, higher resolution images, and social conditions not present in modern replications with recorded stimuli and display monitors. Subjects were tested in pairs and alternated in the same session between a live talker (Live Condition) and a live feed of the talker to a television screen (Screen Condition). Order of presentation mode and word lists (monosyllabic English words) were counterbalanced across subjects. Subjects wore sound isolating headphones and signal intensity was controlled across conditions. Word lists were counterbalanced for spoken word frequency and initial consonant structure. Stimuli we...

1 citations


Journal ArticleDOI
TL;DR: In this article, a large set of monosyllablic nouns was presented at 7 signal-to-noise ratios (pink noise) in both audiovisual (AV) stimuli in the presence of acoustic noise.
Abstract: The sight of a talker’s face dramatically influences the perception of auditory speech. This effect is most commonly observed when subjects are presented audiovisual (AV) stimuli in the presence of acoustic noise. However, the magnitude of the gain in perception that vision adds varies considerably in published work. Here we report data from an ongoing study of individual differences in AV speech perception when English words are presented in an acoustically noisy background. A large set of monosyllablic nouns was presented at 7 signal-to-noise ratios (pink noise) in both AV and auditory-only (AO) presentation modes. The stimuli were divided into 14 blocks of 25 words and each block was equated for spoken frequency using the SUBTLEXus database (Brysbaert and New, 2009). The presentation of the stimulus blocks was counterbalanced across subjects for noise level and presentation. In agreement with Sumby and Pollack (1954), the accuracy of both AO and AV increase monotonically with signal strength with the greatest visual gain being when the auditory signal was weakest. These average results mask considerable variability due to subject (individual differences in auditory and visual perception), stimulus (lexical type, token articulation) and presentation (signal and noise attributes) factors. We will discuss how these sources of variance impede comparisons between studies.