scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1989"


Journal ArticleDOI
TL;DR: The authors used discriminant analysis to combine the five spectro-temporal variables measured from sound spectrograms of these productions to categorize the tokens as voiced or voiceless in each condition.

150 citations


Journal ArticleDOI
TL;DR: Untrained listeners identified 18 different whispered initial consonants significantly better than chance in nonsense syllables and the phonetic features of place and manner of articulation and, to a lesser extent, voicing were correctly identified.
Abstract: Whispering is a common, natural way of reducing speech perceptibility, but whether and how whispering affects consonant identification and the acoustic features presumed important for it in normal speech perception are unknown. In this experiment, untrained listeners identified 18 different whispered initial consonants significantly better than chance in nonsense syllables. The phonetic features of place and manner of articulation and, to a lesser extent, voicing, were correctly identified. Confusion matrix and acoustic analyses indicated preservation of resonance characteristics for place and manner of articulation and suggested the use of burst, aspiration, or frication duration and intensity, and/or first‐formant cutback for voicing decisions.

115 citations


Journal ArticleDOI
Allard Jongman1
TL;DR: Natural speech consonant-vowel syllables followed by [i, u, a] were computer edited to include 20-70 ms of their frication noise in 10-ms steps as measured from their onset, and analysis revealed that fricative identification in terms of place of articulation is much more affected by a decrease in frications duration than identification in Terms of voicing and manner of articulations.
Abstract: Natural speech consonant–vowel (CV) syllables ([f, s, θ, s, v, z, F] followed by [i, u, a]) were computer edited to include 20–70 ms of their frication noise in 10‐ms steps as measured from their onset, as well as the entire frication noise. These stimuli, and the entire syllables, were presented to 12 subjects for consonant identification. Results show that the listener does not require the entire fricative–vowel syllable in order to correctly perceive a fricative. The required frication duration depends on the particular fricative, ranging from approximately 30 ms for [σ, z] to 50 ms for [f, s, v], while [θ, F] are identified with reasonable accuracy in only the full frication and syllable conditions. Analysis in terms of the linguistic features of voicing, place, and manner of articulation revealed that fricative identification in terms of place of articulation is much more affected by a decrease in frication duration than identification in terms of voicing and manner of articulation.

114 citations


Journal ArticleDOI
TL;DR: The results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly, and the voicing boundary is not shifted in the absence of a change in the global percept, even when discrepant auditory-visual information is presented.
Abstract: Visual information provided by a talker's mouth movements can influence the perception of certain speech features. Thus, the "McGurk effect" shows that when the syllable (bi) is presented audibly, in synchrony with the syllable (gi), as it is presented visually, a person perceives the talker as saying (di). Moreover, studies have shown that interactions occur between place and voicing features in phonetic perception, when information is presented audibly. In our first experiment, we asked whether feature interactions occur when place information is specificed by a combination of auditory and visual information. Members of an auditory continuum ranging from (ibi) to (ipi) were paired with a video display of a talker saying (igi). The auditory tokens were heard as ranging from (ibi) to (ipi), but the auditory-visual tokens were perceived as ranging from (idi) to (iti). The results demonstrated that the voicing boundary for the auditory-visual tokens was located at a significantly longer VOT value than the voicing boundary for the auditory continuum presented without the visual information. These results demonstrate that place-voice interactions are not limited to situations in which place information is specified audibly.(ABSTRACT TRUNCATED AT 250 WORDS)

95 citations


Journal ArticleDOI
TL;DR: In this paper, Slowiaczek and Dinnsensenior presented an experiment based on materials recorded in a situation approaching spontaneous dialogue and involving monolingual speakers of one variety of Standard Polish and found that neither perceptual tests nor measurements of segmental duration gave any grounds for rejecting [voice] neutralization either in phrase-final devoicing or in internal assimilation.

87 citations


Journal ArticleDOI
Patti Price1
TL;DR: The magnitudes of the male-female differences are similar to those observed for the creaky-normal voicing differences and breathy-normal differences, and may arise from a combination of biological, sociological and acoustical effects.

47 citations


Journal ArticleDOI
Gary R. Kidd1
TL;DR: The authors found that the pattern of changes in articulatory rate in a precursor phrase can affect the perception of voicing in a syllable-initial prestress velar stop consonant, and that articulatory-rate effects were not restricted to the target syllable's immediate context.
Abstract: Three experiments demonstrated that the pattern of changes in articulatory rate in a precursor phrase can affect the perception of voicing in a syllable-initial prestress velar stop consonant. Fast and slow versions of a 10-word precursor phrase were recorded, and sections from each version were combined to produce several precursors with different patterns of change in articulatory rate. Listeners judged the identity of a target syllable, selected from a 7-member /gi/-ki/ voice-onset-time (VOT) continuum, that followed each precursor phrase after a variable brief pause. The major results were: (a) articulatory-rate effects were not restricted to the target syllable's immediate context; (b) rate effects depended on the pattern of rate changes in the precursor and not the amount of fast or slow speech or the proximity of fast or slow speech to the target syllable: and (c) shortening of the pause (or closure) duration led to a shortening of VOT boundaries rather than a lengthening as previously found in this phonetic context. Results are explained in terms of the role of dynamic temporal expectancies in determining the response to temporal information in speech, and implications for theories of extrinsic vs. intrinsic timing are discussed.

45 citations


Journal ArticleDOI
01 Sep 1989-Lingua
TL;DR: In this paper, the authors claim that at intermediate stages of derivation, Spanish word-final /s/ is followed by an unattached slot on the skeletal tier, as the phonological marker of Word Boundary.

43 citations


01 Jan 1989
TL;DR: An analysis of the relationship between sonorancy and voicing is presented based on a theory of segmental structure which recognizes the possibility that voice may not be a unitary phenomenon and proposes that voicing has two distinct realizations.
Abstract: In this paper we present an analysis of the relationship between sonorancy and voicing based on a theory of segmental structure which recognizes the possibility that voice may not be a unitary phenomenon. We propose that voicing has two distinct realizations. One is through the activation of laryngeal features and the other is spontaneous voicing. We propose that sonorants involve a node which represents spontaneous voicing and that this may also be present in obstruents. We adopt the term SV for this node (see also Piggott 1989). We thus view spontaneous voice as distinct from laryngeal features, sharing the general position of Stevens & Keyser (1989) who state that "voice might be classed as a manner feature" separate from the laryngeal features which deal with laryngeal configurations. An advantage of the feature geometry that we propose in this paper is that it allows an account of sonorant-sonorant interactions, as sonorant features such as [nasal] and [lateral] are both dominated by the SV node. A further advantage is that we are able to shift the burden of explanation from the rule component to the representational component. By recognizing two types of voicing, we eliminate the use of redundancy rules in the specification of voice for sonorants as a method of accounting for phonological processes.

42 citations


Journal ArticleDOI
TL;DR: Using a neuroelectric event-related potential paradigm, numerous effects indicating bilateral components reflecting the voicing and place contrast and unique right hemisphere discrimination of both voiced and place of articulation are found.

42 citations



Journal ArticleDOI
TL;DR: The authors examined the influence of post-vocalic voicing on vowel and closure durations in VCV and VCV sequences, and found that the effect of post vocalic voicing was not as consistent for unstressed vowels as for stressed vowels.

Journal ArticleDOI
TL;DR: This paper examined the influence of post-vocalic voicing on vowel and closure durations in VCV and VCV sequences and found that both stressed and unstressed vowels tended to lengthen before voiced consonants, but the vowel lengthening effect was not as consistent for stressless vowels as for stressed vowels.
Abstract: Previous research has shown that English vowel length varies depending on the voicing characteristic of the following consonant. For stop consonants, closure durations also vary as a function of consonantal voicing. Generally, vowel‐stop sequences containing voiced consonants show longer vowel durations and shorter closure durations than similar sequences containing voiceless consonants. These previous studies have focused on stressed vowels in monosyllabic or bisyllabic words. Very little research has examined the effects of postvocalic voicing on stressless vowels. In the present study, the influence of postvocalic voicing on vowel and closure durations in VCV and VCV sequences is studied. Subjects produced sentence pairs containing target words contrasting in intervocalic consonantal voicing (e.g., adopt‐atop, tabbing‐tapping). Both stressed and unstressed vowels tended to lengthen before voiced consonants. However, the vowel‐lengthening effect was not as consistent for stressless vowels as for stressed vowels. Closure durations were longer for voiceless stops than voiced stops after a stressed vowel. However, voicing effects on closure duration were inconsistent after stressless vowels. The results have implications concerning perceptual cues for intervocalic voicing and for issues concerning syllable‐internal structure. [Work supported by NIH.]

Journal ArticleDOI
TL;DR: The results indicated that perception was consistently more advanced than production and correlations between comparable perception and production measures were nonsignificant, and a pairwise comparisons analysis indicated that perceived consistency was not adult-like until 10 years of age.
Abstract: The purposes of this study were to assess: (a) the development of identification and discrimination in children for the vowel duration cue to final consonant voicing and (b) the perception/producti...

Journal ArticleDOI
TL;DR: The results suggest that the basic mechanism for the identification of consonants in chimpanzees is similar to that in humans, although chimpanzees are less accurate than humans in discrimination of consonant.
Abstract: The perception of consonants which were followed by the vowel [a] was studied in chimpanzees and humans, using a reaction time task in which reaction times for discrimination of syllables were taken as an index of similarity between consonants. Consonants used were 20 natural French consonants and six natural and synthetic Japanese stop consonants. Cluster and MDSCAL analyses of reaction times for discrimination of the French consonants suggested that the manner of articulation is the major determinant of the structure of the perception of consonants by the chimpanzees. Discrimination of stop consonants suggested that the major grouping in the chimpanzees was by voicing. The place of articulation from the lips to the velum was reproduced only in the perception of the synthetic unvoiced stop consonants in the two dimensional MDSCAL space. The phoneme-boundary effect (categorical perception) for the voicing and place-of-articulation features was also examined by a chimpanzee using synthetic [ga]-[ka] and [ba]-[da] continua, respectively. The chimpanzee showed enhanced discriminability at or near the phonetic boundaries between the velar voiced and unvoiced and also between the voiced bilabial and alveolar stops. These results suggest that the basic mechanism for the identification of consonants in chimpanzees is similar to that in humans, although chimpanzees are less accurate than humans in discrimination of consonants.

Journal ArticleDOI
TL;DR: In this article, the authors examine French stop consonants in isolated utterances and show that the timing relationship between the onset/offet of voice and the release of the closure provides a very reliable acoustic criterion to separate voiced from voiceless stops, and a major cue for the perception of the voicing feature.
Abstract: Previous studies on the perception of French stop consonants in isolated utterances have demonstrated that the timing relationship between the onset/offet of voice and the release of the closure provides a very reliable acoustic criterion to separate voiced from voiceless stops, and a major cue for the perception of the voicing feature. The aim of this work is to examine stops pronounced spontaneously during a conversation. Experiment 1, an acoustic analysis of spontaneous productions, largely confirms the high reliability of voice timing cues but also shows that they can be occasionally misleading. Experiment 2, where subjects were asked to identify stops contained in excerpts from the conversation, confirms the major part played by voice timing in the perception of voicing. It also suggests that the reliability of the voice timing cues is not optimally exploited by the perceptual system. Experiment 3 and a control condition, where subjects were asked to write down the contents of excerpts varying in duration, demonstrate that the supplementary information that allows correct identification in the frame of the conversation is provided by top-down processes. These experiments also suggest that secondary acoustic cues play a decisive part in the case of conflict between voice timing cues and top-down information. © 1989, SAGE Publications. All rights reserved.

Journal ArticleDOI
TL;DR: A different method of detecting learning savings during acquisition was used, using a set of complex symbols standing for phones, with the elements representing voicing and place and no reaction time advantage emerged in the consistent condition, further evidence of nonanalytic acquisition.
Abstract: Previous research (Byrne, 1984) showed that adults who learned to read an orthography representing phonetic features (voicing, place of articulation) did not readily obtain usable knowledge of the mapping of phonetic features onto orthographic elements, as evidenced by failure to generalize to partially new stimuli. The present Experiment 1 used a different method of detecting learning savings during acquisition. Subjects learned a set of complex symbols standing for phones, with the elements representing voicing and place. In a second acquisition set, the signs for voicing were reversed. Learning speed was not affected, which was consistent with the claim that feature-element links went unnoticed in initial acquisition. In Experiment 2, some subjects were instructed to \ldfind the rule\rd embodied in the orthography. None did, and acquisition rates were no different from those of uninstructed subjects. In Experiment 3, subjects had 4 h of training on the orthography, with consistent feature-symbol mapping for half of the subjects and arbitrary pairings for the remainder. No reaction time advantage emerged in the consistent condition, which is further evidence of nonanalytic acquisition. The results are related to data from children learning to read.


Patent
12 Apr 1989
TL;DR: In this article, a linear prediction coefficient and a residual waveform in the voiced section are found out, a pitch period is also found out to determine one pitch section and normalization power is defined.
Abstract: PURPOSE: To hold the continuity of an waveform and to suppress the deterioration of sound quality by separating an input voice into vowel and consonant sections and changing vocalization speed in each section in accordance with a vocalization feature. CONSTITUTION: The voice section and silent section of an A/D converted input voice are discriminated by an analysis part 2, the voiceless consonant section and voiced section of the input voice are discriminated and these waveforms are stored. A linear prediction coefficient and a residual waveform in the voiced section are found out, a pitch period is also found out to determine one pitch section and normalization power is defined. A vowel is separated from a voiced consonant part by using resonance frequency and the normalization power. When a control part 4 extends the length of a silent section or repeats or thins respective pitches of the voiced section by proper distribution, a vocalization speed is changed and a new pitch period string is prepared. An waveform connection part 6 connects respective parts by extending/shortening their vocalization time length based upon the new pitch period string to obtain a new voice waveform.

Journal ArticleDOI
TL;DR: The results indicated that the effects of the parameters are additive and that, although presence/absence of periodicity (VOT and VTT) is the most important determinant of perceived voicing, perception is also to a large extent affected by “C2”-duration and “preceding vowel” duration.

Journal ArticleDOI
TL;DR: It is hypothesized that the acoustic properties of a syllable-initial/s/, make the noise cohere with the following speech signal, which makes it difficult for listeners to focus on the VOT differences to be discriminated.
Abstract: When discriminating pairs of speech stimuli from an acoustic voice onset time (VOT) continuum (for example, one ranging from/ba/to/pa/), English-speaking subjects show a characteristic performance peak in the region of the phonemic category boundary. We demonstrate that this “category boundary effect” is reduced or eliminated when the stimuli are preceded by/s/. This suppression does not seem to be due to the absence of a phonological voicing contrast for stop consonants following/s/, since it is also obtained when the/s/terminates a preceding word and (to a lesser extent) when broadband noise is substituted for the fricative noise. The suppression is stronger, however, when the noise has the acoustic properties of a syllable-initial/s/, all else being equal. We hypothesize that these properties make the noise cohere with the following speech signal, which makes it difficult for listeners to focus on the VOT differences to be discriminated.

Patent
01 Sep 1989
TL;DR: In this article, a speech section length measuring means which measures the length of the voice of a user was used to detect the end of the voicing of a voicing person speedily with high reliability.
Abstract: PURPOSE:To detect the end of the voicing of a voicing person speedily with high reliability by providing a speech section length measuring means which measures the length of the voice of a user and varying the detection time of the voicing end according to the length of a speech section right before no voice is detected. CONSTITUTION:Knowledge regarding human habit is utilized to control the voicing end detection time adaptively by discriminating no voice in thinking time and no voice after the end of voicing. (1) The possibility that a long pause is generated after long voice like 'Well,...' is large. (2) The possibility that voicing is not finished after a voice with a large power value like '..., but' is large. (3) When the mean length of the pause observed after the start of the voicing is long, the length of the maximum pause time tends to be long and (4) when the frequency of a pause observed after the start of the voicing is large, the length of the longest pause time tends to be long. Therefore, when the mean length of a pause is long after emphasized sound after a long vowel, the generation end detection time needs to be set longer in case of the high generation frequency of a pause.


Journal ArticleDOI
TL;DR: An acoustic-perceptual investigation was performed on various aspects of timing in the speech of a 21-year-old adult speaker of Thai who reportedly did not start speaking until the age of 7, indicating that speech timing skills relatcd to stop consonant voicing, vowel length, and rhythm can be differentially impaired.
Abstract: An acoustic-perceptual investigation was performed on various aspects of timing in the speech of a 21-year-old adult speaker of Thai who reportedly did not start speaking until the age of 7. Selected aspects of timing included: ( I ) the voicing contrast in Thai homorganic word-initial stops; (2) the duration contrast in Thai short and long vowels; and (3) the duration patterns of phrases and sentences in Thai connected speech. Measures of stop consonant voicing and vowel length were taken from monosyllabic citation forms; measures of syllables, phrases and sentences from an oral reading of a paragraph-sized passage. Findings indicated that speech timing skills relatcd to stop consonant voicing, vowel length, and rhythm can be differentially impaired, and moreover, that the pattern of impairment appears to be related to the size of the temporal planning unit.

Patent
29 Dec 1989
TL;DR: In this paper, the authors proposed a method to obtain a synthesized voice with good balance in time length between vocal sounds when the voicing speed of the synthesised voice is varied by determining the section length of the stationary part of a vowel according to mora length.
Abstract: PURPOSE: To obtain a synthesized voice with good balance in time length between vocal sounds when the voicing speed of the synthesized voice is varied by determining the section length of the stationary part of a vowel according to mora length which varies with the voicing speed of the synthesized voice by using a function which is set for each vowel, and expanding or contracting and connecting a voice parameter according to the section length. CONSTITUTION: A phoneme data read part 1 reads phoneme data out of a phoneme data file 2 according to vocal sound sequence information which is inputted. Then a vowel length determination part 3 determines the length of the stationary part of the vowel according to the supplied mora information. Then the function indicating the length relation of the vowel stationary part is used to determine and secure the length of the vowel according to the mora length, and the length of a transition part from a vowel to a consonant or vice versa is found to control the time length of the phoneme, thereby connecting it. Consequently, even when the voicing speed of the synthesized voice is varied, the synthesized voice with good balance in the time length between phonemes is obtained according to the mora length. COPYRIGHT: (C)1991,JPO&Japio

Journal ArticleDOI
TL;DR: In this paper, an algorithm for the automatic alignment of phonetic events with x-ray microbeam articulation data and the corresponding acoustic signal is described. But the system is presently speaker dependent, and has to be trained to the articulatory data for the particular speaker.
Abstract: This paper describes an algorithm for the automatic alignment of phonetic events with x‐ray microbeam articulation data and the corresponding acoustic signal. The algorithm uses a two‐step procedure similar to that of Nelson [Nelson et al., J. Acoust. Soc. Am. Suppl. 1 63, S32 (1978); Nelson, AT&T Bell Laboratories Internal Rep. (1978)]. The first step locates the phrase boundary in continuous speech, and the second step matches the phonetic segments in each phrase. Articulatory and acoustic events are recognized in continuous speech, and matched to the predicted phonetic events using a dynamic programming technique. The place of articulation and voicing for certain phonemes are also matched with articulatory and acoustic events. The system is presently speaker dependent, and has to be trained to the articulatory data for the particular speaker. [Work supported by Ohio State University Speech and Heating Department.]

Journal ArticleDOI
TL;DR: The authors showed that the direction of F0 change after stops differing in voicing depends on the surrounding macro-prosodic contour of the surrounding intonation contour, which is the macro-professional contour in which word-initial consonants were embedded at two levels.
Abstract: Kohler argues, on the basis of German data, that differences in F0 before or after voiced and voiceless stops (the microprosodies) may be obliterated by the larger scale effects on F0 of the surrounding intonation contour (the macroprosodies) [Kohler, Phonetica 39, 199–218 (1982)]. In a similar fashion, Silverman claims, on the basis of English data, that the direction of F0 change after stops differing in voicing depends on the surrounding macroprosodic contour [Silverman, Phonetica 43, 76–91 (1986)]; The macroprosodic contour in which word‐initial consonants [p, t, b, d, sp, n] were embedded at two levels was manipulated: (1) The consonant occurred either immediately before a stressed vowel or before an unstressed vowel, e.g., palace versus police; and (2) the words occurred either in focus or with focus on the preceding word, e.g., I saw a few churches on Monday, but many PALACES on Wednesday versus I saw only a few on Monday but MANY palaces on Wednesday. These manipulations were entirely orthogonal a...

Journal ArticleDOI
TL;DR: The authors found that full voicing is generally independent of the variation in the oral closure interval of a lax stop: full voicing will not necessarily occur for the shortest lax stop closures of short duration.