scispace - formally typeset
Search or ask a question
Topic

Voice

About: Voice is a research topic. Over the lifetime, 2393 publications have been published within this topic receiving 56637 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The findings from these experiments indicate that phoneme and rate information are encoded in an integral manner during speech perception, while talker characteristics are encoded separately.
Abstract: The acoustic structure of the speech signal is extremely variable due to a variety of contextual factors, including talker characteristics and speaking rate. To account for the listener’s ability to adjust to this variability, speech researchers have posited the existence of talker and rate normalization processes. The current study examined how the perceptual system encoded information about talker and speaking rate during phonetic perception. Experiments 1–3 examined this question, using a speeded classification paradigm developed by Garner (1974). The results of these experiments indicated that decisions about phonemic identity were affected by both talker and rate information: irrelevant variation in either dimension interfered with phonemic classification. While rate classification was also affected by phoneme variation, talker classification was not. Experiment 4 examined the impact of talker and rate variation on the voicing boundary under different blocking conditions. The results indicated that talker characteristics influenced the voicing boundary when talker variation occurred within a block of trials only under certain conditions. Rate variation, however, influenced the voicing boundary regardless of whether or not there was rate variation within a block of trials. The findings from these experiments indicate that phoneme and rate information are encoded in an integral manner during speech perception, while talker characteristics are encoded separately.

69 citations

01 Jan 2000
TL;DR: In this article, the authors examined long distance voicing agreement between consonants (Cs) and showed that these agreement patterns come about through a correspondence relation that is established between Cs in the output.
Abstract: This paper examines long distance voicing agreement between consonants (Cs). Two related patterns are observed. In the first one [voice] agreement is restricted to pairs of oral stops that match in place of articulation, as seen in Ngbaka. In the second, observed in Kera, [voice] agreement occurs among all pairs of stops. I argue that these agreement patterns come about through a correspondence relation that is established between Cs in the output. The notion of intersegmental correspondence will be important in explaining two key properties of the phenomena: (i) the potential for interaction between Cs at a distance, and (ii) the preference for voicing agreement to occur between similar Cs. From a wider perspective, this analysis is supported by work on other consonantal agreement patterns that display similar characterizing properties (Walker 1999, Rose & Walker in prep.). In addition, I propose that the correspondence approach has the potential to extend to cases of voicing dissimilation. The analysis is couched within Optimality Theory (OT; Prince & Smolensky 1993). The paper is organized as follows. In §2 I present the data illustrating voicing agreement between Cs at a distance. §3 diagnoses the agreement as arising through the mechanism of segmental correspondence rather than feature spreading. In §4 I lay out a theoretical overview of the correspondence approach to long-distance agreement, and then develop the details of analysis of Ngbaka and Kera. §5 discusses an extension to voicing dissimilation phenomena, and §6 gives the conclusion.

69 citations

Journal ArticleDOI
TL;DR: The authors showed that the essential cues for understanding spoken language are largely dynamic in nature, derived from the complex modulation spectrum (including both amplitude and phase) below 20 Hz, segmentation of the speech signal into syllabic intervals between 50 and 400 ms, and a multi-time-scale, coarse-grained analysis of phonetic constituents into features based on voicing, manner and place of articulation.
Abstract: Classical models of speech recognition (by both human and machine) assume that a detailed, short‐term analysis of the acoustic signal is essential for accurately decoding spoken language. Several lines of evidence call this assumption into question: (1) intelligibility is relatively unimpaired when the frequency spectrum is distorted under a wide range of conditions (including cross‐spectral asynchrony, reverberation, waveform time reversal and selective deletion of 80% of the spectrum), (2) the acoustic properties of spontaneous speech rarely conform to canonical patterns associated with specific phonetic segments, and (3) automatic‐speech‐recognition phonetic classifiers often require ca. 250 ms of acoustic context (spanning several segments) to function reliably. This pattern of evidence suggests that the essential cues for understanding spoken language are largely dynamic in nature, derived from (1) the complex modulation spectrum (incorporating both amplitude and phase) below 20 Hz, (2) segmentation of the speech signal into syllabic intervals between 50 and 400 ms, and (3) a multi‐time‐scale, coarse‐grained analysis of phonetic constituents into features based on voicing, manner and place of articulation. [Work supported by the U.S. Department of Defense and NSF.]

69 citations

Journal ArticleDOI
TL;DR: Speech phoneme intelligibility was measured in a closed-set word discrimination test and through phonetic transcriptions of the spoken materials and an analysis of perceptual confusions revealed that errors were most frequently associated with the voicing feature and that few manner or place of articulation errors occurred.
Abstract: Five normal‐speaking adult males were taught to produce speech using an electrolarynx. Speech phoneme intelligibility was measured in a closed‐set word discrimination test and through phonetic transcriptions of the spoken materials. Mean percentages of correct identification for the five talkers were 90% and 57% for the word‐identification test and phonetic transcription, respectively. An analysis of perceptual confusions revealed that errors were most frequently associated with the voicing feature and that few manner or place of articulation errors occurred. Over the range of variables observed, the intensity of both the speech and the noise radiating directly from the electrolarynx, the spectrum of the radiated noise and speaking rate were not found to be determinants of intelligibility.

69 citations

Journal ArticleDOI
TL;DR: The results were interpreted as indicating that the magnitude of VOT difference required for distinguishing between prevocalic stop cognates decreases as a function of the age of the listeners.
Abstract: Perceptual development of the voicing contrast was investigated in two-year-old children, six-year-old children, and adults. Subjects were required to identify pre-vocalic stop consonants from synt...

68 citations


Network Information
Related Topics (5)
Speech perception
12.3K papers, 545K citations
85% related
Speech processing
24.2K papers, 637K citations
78% related
First language
23.9K papers, 544.4K citations
75% related
Sentence
41.2K papers, 929.6K citations
75% related
Noise
110.4K papers, 1.3M citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023102
2022248
202156
202073
201981
201888