scispace - formally typeset
Search or ask a question
Author

Rachel Baker

Bio: Rachel Baker is an academic researcher from Northwestern University. The author has contributed to research in topics: Speech production & American English. The author has an hindex of 11, co-authored 15 publications receiving 770 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The Wildcat corpus as discussed by the authors contains scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English, with a focus on dialogue-based, laboratory-quality speech recordings.
Abstract: This paper describes the development of the Wildcat Corpus of native- and foreign-accented English,a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English.The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). Dialogues between two native speakers of English, between two non-native speakers of English (with either shared or different LIs), and between one native and one non-native speaker of English are included and analyzed in terms of general measures of communicative efficiency.The overall finding was that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non-native pairs with shared LI. Non-native pairs with different LIs were least efficient.These results support the hypothesis that successful speech communication depends both on the alignment of talkers to the target language and on the alignment of talkers to one another in terms of native language background.

157 citations

Journal ArticleDOI
TL;DR: Speech production is listener-focused, and that talkers modulate their speech according to their interlocutors' needs, even when not directly experiencing the challenging listening condition.
Abstract: This study investigated whether speech produced in spontaneous interactions when addressing a talker experiencing actual challenging conditions differs in acoustic-phonetic characteristics from speech produced (a) with communicative intent under more ideal conditions and (b) without communicative intent under imaginary challenging conditions (read, clear speech). It also investigated whether acoustic-phonetic modifications made to counteract the effects of a challenging listening condition are tailored to the condition under which communication occurs. Forty talkers were recorded in pairs while engaged in “spot the difference” picture tasks in good and challenging conditions. In the challenging conditions, one talker heard the other (1) via a three-channel noise vocoder (VOC); (2) with simultaneous babble noise (BABBLE). Read, clear speech showed more extreme changes in median F0, F0 range, and speaking rate than speech produced to counter the effects of a challenging listening condition. In the VOC condition, where F0 and intensity enhancements are unlikely to aid intelligibility, talkers did not change their F0 median and range; mean energy and vowel F1 increased less than in the BABBLE condition. This suggests that speech production is listener-focused, and that talkers modulate their speech according to their interlocutors’ needs, even when not directly experiencing the challenging listening condition.

144 citations

Journal ArticleDOI
TL;DR: Details of the development of the DiapixUK materials are presented, along with data taken from a large corpus of spontaneous speech that are used to demonstrate its new features, and current and potential applications of the task are discussed.
Abstract: The renewed focus of attention on investigating spontaneous speech samples in speech and language research has increased the need for recordings of speech in interactive settings. The DiapixUK task is a new and extended set of picture materials based on the Diapix task by Van Engen et al. (Language and Speech, 53, 510–540, 2010), where two people are recorded while conversing to solve a ‘spot the difference’ task. The new task materials allow for multiple recordings of the same speaker pairs due to a larger set of picture pairs that have a number of tested features: equal difficulty across all 12 picture pairs, no learning effect of completing more than one picture task and balanced contributions from both speakers. The new materials also provide extra flexibility, making them useful in a wide range of research projects; they are multi-layered electronic images that can be adapted to suit different research needs. This article presents details of the development of the DiapixUK materials, along with data taken from a large corpus of spontaneous speech that are used to demonstrate its new features. Current and potential applications of the task are also discussed.

138 citations

Journal ArticleDOI
TL;DR: It is found that first mentions were more likely to be accented than second mentions, and when differences in accent likelihood were controlled, a significant second mention reduction effect remained, supporting the concept of a direct link between probability and duration, rather than a relationship solely mediated by prosodic prominence.
Abstract: This article examines how probability (lexical frequency and previous mention), speech style, and prosody affect word duration, and how these factors interact. Participants read controlled materials in clear and plain speech styles. As expected, more probable words (higher frequencies and second mentions) were significantly shorter than less probable words, and words in plain speech were significantly shorter than those in clear speech. Interestingly, we found second mention reduction effects in both clear and plain speech, indicating that while clear speech is hyper-articulated, this hyper-articulation does not override probabilistic effects on duration. We also found an interaction between mention and frequency, but only in plain speech. High frequency words allowed more second mention reduction than low frequency words in plain speech, revealing a tendency to hypo-articulate as much as possible when all factors support it. Finally, we found that first mentions were more likely to be accented than second mentions. However, when these differences in accent likelihood were controlled, a significant second mention reduction effect remained. This supports the concept of a direct link between probability and duration, rather than a relationship solely mediated by prosodic prominence.

106 citations

Journal ArticleDOI
TL;DR: The Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English, finds that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non- native pairs with shared LI.
Abstract: For a wide range of socio‐political reasons, many conversations across the globe today are between interlocutors who do not share a mother tongue. As a resource for investigating the nature and broad implication of speech communication in a global context, we are currently developing a large database of speech recordings from native and non‐native speakers of English. A key feature of this database is that, in addition to providing recordings of scripted materials, the speakers are recorded in pairs (all possible pairing of native and non‐native English speakers) as they work together on a novel, interactive, goal‐oriented task. The final corpus will include fully segmented and phonetically aligned digital recordings of both the scripted and unscripted speech samples along with complete orthographic transcriptions. We are currently using a first version of this database to track speaker‐listener alignment over the course of a conversation, to compare phonetic features of speech addressed to native versus to non‐native speakers, and to assess communication efficiency across various native and non‐native speaker pairings (e.g., how long does it take for a team‐based task to reach successful completion when one, both, or neither of the team members is a non‐native speaker?)

103 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A review of the effects of adverse conditions (ACs) on the perceptual, linguistic, cognitive, and neurophysiological mechanisms underlying speech recognition is presented in this paper, where the authors advocate an approach to speech recognition that includes rather than neutralises complex listening environments and individual differences.
Abstract: This article presents a review of the effects of adverse conditions (ACs) on the perceptual, linguistic, cognitive, and neurophysiological mechanisms underlying speech recognition. The review starts with a classification of ACs based on their origin: Degradation at the source (production of a noncanonical signal), degradation during signal transmission (interfering signal or medium-induced impoverishment of the target signal), receiver limitations (peripheral, linguistic, cognitive). This is followed by a parallel, yet orthogonal classification of ACs based on the locus of their effect: Perceptual processes, mental representations, attention, and memory functions. We then review the added value that ACs provide for theories of speech recognition, with a focus on fundamental themes in psycholinguistics: Content and format of lexical representations, time-course of lexical access, word segmentation, feed-back in speech perception and recognition, lexicalsemantic integration, interface between the speech system and general cognition, neuroanatomical organisation of speech processing. We conclude by advocating an approach to speech recognition that includes rather than neutralises complex listening environments and individual differences.

555 citations

Journal ArticleDOI
TL;DR: Phonetic convergence between talker pairs that vary in the degree of their initial language alignment may be dynamically mediated by two parallel mechanisms: the need for intelligibility and the extra demands of nonnative speech production and perception.
Abstract: This study explores phonetic convergence during conversations between pairs of talkers with varying language distance. Specifically, we examined conversations within two native English talkers and within two native Korean talkers who had either the same or different regional dialects, and between native and nonnative talkers of English. To measure phonetic convergence, an independent group of listeners judged the similarity of utterance samples from each talker through an XAB perception test, in which X was a sample of one talker’s speech and A and B were samples from the other talker at either early or late portions of the conversation. The results showed greater convergence for same-dialect pairs than for either the different-dialect pairs or the different-L1 pairs. These results generally support the hypothesis that there is a relationship between phonetic convergence and interlocutor language distance. We interpret this pattern as suggesting that phonetic convergence between talker pairs that vary in the degree of their initial language alignment may be dynamically mediated by two parallel mechanisms: the need for intelligibility and the extra demands of nonnative speech production and p erception.

166 citations

Journal ArticleDOI
TL;DR: The Wildcat corpus as discussed by the authors contains scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English, with a focus on dialogue-based, laboratory-quality speech recordings.
Abstract: This paper describes the development of the Wildcat Corpus of native- and foreign-accented English,a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English.The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). Dialogues between two native speakers of English, between two non-native speakers of English (with either shared or different LIs), and between one native and one non-native speaker of English are included and analyzed in terms of general measures of communicative efficiency.The overall finding was that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non-native pairs with shared LI. Non-native pairs with different LIs were least efficient.These results support the hypothesis that successful speech communication depends both on the alignment of talkers to the target language and on the alignment of talkers to one another in terms of native language background.

157 citations

Journal ArticleDOI
TL;DR: Speech production is listener-focused, and that talkers modulate their speech according to their interlocutors' needs, even when not directly experiencing the challenging listening condition.
Abstract: This study investigated whether speech produced in spontaneous interactions when addressing a talker experiencing actual challenging conditions differs in acoustic-phonetic characteristics from speech produced (a) with communicative intent under more ideal conditions and (b) without communicative intent under imaginary challenging conditions (read, clear speech). It also investigated whether acoustic-phonetic modifications made to counteract the effects of a challenging listening condition are tailored to the condition under which communication occurs. Forty talkers were recorded in pairs while engaged in “spot the difference” picture tasks in good and challenging conditions. In the challenging conditions, one talker heard the other (1) via a three-channel noise vocoder (VOC); (2) with simultaneous babble noise (BABBLE). Read, clear speech showed more extreme changes in median F0, F0 range, and speaking rate than speech produced to counter the effects of a challenging listening condition. In the VOC condition, where F0 and intensity enhancements are unlikely to aid intelligibility, talkers did not change their F0 median and range; mean energy and vowel F1 increased less than in the BABBLE condition. This suggests that speech production is listener-focused, and that talkers modulate their speech according to their interlocutors’ needs, even when not directly experiencing the challenging listening condition.

144 citations

Journal ArticleDOI
TL;DR: This study investigates whether speakers have a context-independent bias to reduce low-informativity words, which are usually predictable and therefore usually reduced, and supports representational models in which reduction is stored, and where sufficiently frequent reduction biases later production.

139 citations