scispace - formally typeset
Search or ask a question
Topic

Voice

About: Voice is a research topic. Over the lifetime, 2393 publications have been published within this topic receiving 56637 citations.


Papers
More filters
01 Jan 1999
TL;DR: In this paper, the authors assess whether natural sounding excitation near segment boundaries enhances the intelligibility of formant synthesis and find that synthesized phrases proved more intelligible in noise when excitation at fricative boundaries and in voiced stop closures was structurally appropriate.
Abstract: This work assesses whether natural-sounding excitation near segment boundaries enhances the intelligibility of formant synthesis. Excitation type at fricative-vowel (FV) and vowelfricative (VF) boundaries and durations of voicing in voiced stop closures are described for one male speaker of British English. Most VF boundaries have mixed aperiodic and periodic excitation, whereas most FV boundaries change abruptly from aperiodic to periodic excitation. Syllable stress, vowel height, and final/non-final position within the phrase influenced the incidence and duration of mixed excitation. Voicing in stop closures varied in well-understood ways. Synthesized phrases proved more intelligible in noise when excitation at fricative boundaries and in voiced stop closures was structurally appropriate. Implications for formant synthesis are discussed.

13 citations

Journal ArticleDOI
TL;DR: In this paper, acoustic measurements are shown to validate the apparent differences between these two similar phonation types, and relative harmonic intensity and harmonicity were found to be, in general, three ways distinct among Hmong modal, breathy, and whispery phonation.
Abstract: The White dialect of Hmong uses breathy voice as a tonal feature, and also a distinctive whispery voice as a stop consonant feature. In this paper, acoustic measurements are shown to validate the apparent differences between these two similar phonation types. In particular, relative harmonic intensity and harmonicity were found to be, in general, three ways distinct among Hmong modal, breathy, and whispery phonation. The discovery of distinctly pronounced breathy and whispery phonation in a single language has implications for the representational theory which is used to specify the phonetic grammar.

13 citations

01 Jan 2004
TL;DR: A probabilistic framework for landmark-based speech recognition that utilizes the sufficiency and context invariance properties of acoustic cues for phonetic features is presented and results have been obtained for manner recognition and the corresponding landmarks.
Abstract: A probabilistic framework for landmark-based speech recognition that utilizes the sufficiency and context invariance properties of acoustic cues for phonetic features is presented. Binary classifiers of the manner phonetic features "sonorant", "continuant" and "syllabic" operate on each frame of speech, each using a small number of relevant and sufficient acoustic parameters to generate probabilistic landmark sequences. The relative nature of the parameters developed for the extraction of acoustic cues for manner phonetic features makes them "invariant" of the manner of neighboring speech frames. This invariance of manner acoustic cues makes the use of only those three classifiers along with the speech/silence classifier complete irrespective of the manner context. The obtained landmarks are then used to extract relevant acoustic cues to make probabilistic binary decisions for the place and voicing phonetic features. Similar to the invariance property of the manner acoustic cues, the acoustic cues for place phonetic features extracted using manner landmarks are invariant of the place of neighboring sounds. Pronunciation models based on phonetic features are used to constrain the landmark sequences and to narrow the classification of place and voicing. Preliminary results have been obtained for manner recognition and the corresponding landmarks. Using classifiers trained from the phonetically rich TIMIT database, 80.2% accuracy was obtained for broad class recognition of the isolated digits in the TIDIGITS database which compares well with the accuracies of 74.8% and 81.0% obtained by a hidden Markov model (HMM) based system using mel-frequency cepstral coefficients (MFCCs) and knowledge-based parameters, respectively.

13 citations

Journal ArticleDOI
TL;DR: The data support the influence of both general auditory abilities and unique speech processes on categorical perception of speech and different category boundaries for speech and non-speech stimuli in Hebrew and across languages.
Abstract: The nature of the mechanism responsible for the categorical labeling of stimuli is not clear. One hypothesis suggests that categorization is limited by the 'natural sensitivities' of the auditory system. The alternative hypothesis suggests that categorization is mediated by a special speech mode and is influenced by how speech is produced. The present study attempts to provide some insight into this dilemma by evaluating categorical perception (CP) in speech and non-speech stimuli and across languages. Specifically, the goals of the present study were (1) to compare phonetic boundaries of Hebrew voicing to categorical boundaries (CB) of a two-tone complex which varies in the relative timing of the two tones (TOT) [TOT stimuli are considered be the non-speech analog to voice-onset time (VOT)], and (2) to re-establish the CB values of non-speech analog to voicing in American-English speakers using the same TOT continua as the Hebrew speakers and to compare them to CB of Hebrew-speaking subjects. Our assumption was that if CP is mediated by basic auditory sensitivity then we expect similar CB for speech and non-speech stimuli and no effect of language on CB. If, however, a special speech code determines CP, then phonetic boundaries are expected to be different from CB of non-speech stimuli and across languages. Of particular interest is the special case of Hebrew whose voice-voiceless distinction in production is very different from that in English. Twelve Hebrew-speaking adults and 12 American-English speaking adults participated in this study. Stimuli consisted of (a) a two-tone complex continuum that varied in the relative onset time of the lower tone from a lead of -100 ms to a lag of +50 ms in 10 ms steps, and (b) a /ba-pa/ continuum which varied in VOT values similar to (a). Subjects identified TOT stimuli as belonging to one of three categories: leading, simultaneous, or lagging. VOT stimuli were labeled as /ba/ or /pa/. Results show (a) different phonetic boundary for Hebrew voicing compared to published data on English voicing, (b) different category boundaries for speech and non-speech stimuli in Hebrew, (c) a phonetic boundary for Hebrew voicing that does not align with the VOT values of production, and (d) very similar CB for TOT stimuli in Hebrew- and American-English-speaking subjects. The data support the influence of both general auditory abilities and unique speech processes on categorical perception of speech.

13 citations

Journal ArticleDOI
TL;DR: The results provide arguments for the involvement of the speech motor cortex in phonological discrimination, and suggest a multimodal representation of speech units.

13 citations


Network Information
Related Topics (5)
Speech perception
12.3K papers, 545K citations
85% related
Speech processing
24.2K papers, 637K citations
78% related
First language
23.9K papers, 544.4K citations
75% related
Sentence
41.2K papers, 929.6K citations
75% related
Noise
110.4K papers, 1.3M citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023102
2022248
202156
202073
201981
201888