scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1988"


Journal ArticleDOI
TL;DR: This article propose the alternative hypothesis that language communities intentionally vary vowel length in order to enhance auditorily the closure-duration cue for voicing distinctions, and show that a longer initial segment causes a reliable shift in subjects' two-category labeling boundaries toward longer medial gap durations.

176 citations


Journal ArticleDOI
TL;DR: The role of changes in the longitudinal tension of the vocal folds, as indicated by cricothyroid (CT) muscle activity, has been explored in this article to determine the voicing status of a speech segment.
Abstract: Initiation and maintenance of vibrations of the vocal folds require suitable conditions of adduction, longitudinal tension, and transglottal airflow. Thus manipulation of adduction/abduction, stiffening/slackening, or degree of transglottal flow may, in principle, be used to determine the voicing status of a speech segment. This study explores the control of voicing and voicelessness in speech with particular reference to the role of changes in the longitudinal tension of the vocal folds, as indicated by cricothyroid (CT) muscle activity. Electromyographic recordings were made from the CT muscle in two speakers of American English and one speaker of Dutch. The linguistic material consisted of reiterant speech made up of CV syllables where the consonants were voiced and voiceless stops, fricatives, and affricates. Comparison of CT activity associated with the voiced and voiceless consonants indicated a higher level for the voiceless consonants than for their voiced cognates. Measurements of the fundamental frequency (F0) at the beginning of a vowel following the consonant show the common pattern of higher F0 after voiceless consonants. For one subject, there was no difference in cricothyroid activity for voiced and voiceless affricates; in this case, the consonant‐induced variations in the F0 of the following vowel were also less robust. Consideration of timing relationships between the EMG curves for voiced and voiceless consonants suggests that the differences most likely reflect control of vocal‐fold tension for maintenance or suppression of phonatory vibrations. The same mechanism also seems to contribute to the well‐known difference in F0 at the beginning of vowels following voiced and voiceless consonants.

172 citations


Journal ArticleDOI
TL;DR: The results indicated that under instructions of speeded responding, listeners could, on some trials, ignore some later occurring contextual information within the word that specified rate and lexical status, but could not ignore speaking rate entirely.
Abstract: Among the contextual factors known to play a role in segmental perception are the rate at which the speech was produced and the lexical status of the item, that is, whether it is a meaningful word of the language. In a series of experiments on the word-initial /b/-/p/ voicing distinction, we investigated the conditions under which these factors operate during speech processing. The results indicated that under instructions of speeded responding, listeners could, on some trials, ignore some later occurring contextual information within the word that specified rate and lexical status. Importantly, however, they could not ignore speaking rate entirely. Although they could base their decision on only the early portion of the word, when doing so they treated the word as if it were physically short--that is to say, as if there were no later occurring information specifying a slower rate. This suggests that listeners always take account of rate when identifying the voicing value of a consonant, but precisely which information within the word is used to specify rate can vary with task demands.

120 citations


Journal ArticleDOI
TL;DR: The results support the idea that context-dependent processing can be based on coarse aspects of the speech signal, and suggest that the precursive sounds must have some acoustic continuity with the test word for integration to take place.
Abstract: This study investigated the idea that human speech recognition can involve analyzing the speech signal at multiple levels of resolution, using the information obtained from relatively coarse levels of analysis as a context for interpreting detailed acoustic cues to segment identity. Three experiments examined the effectiveness of coarse-grained aspects of speech in inducing rate-dependent processing of closure duration as a cue to phonological voicing in a medial stop consonant (specifically, rabid vs. rapid). Experiment 1 showed that the rate of articulation of a severely filtered precursor phrase influenced voicing judgments about a segment in an unfiltered test word. Experiment 2 showed a similar effect when the amplitude envelope of the precursive speech was filled with a constant-frequency sine wave set at the fundamental of the test word. The contextual effects of these coarse-grained aspects of speech did not differ from those of acoustically detailed precursive speech. Experiment 3 showed that no context-dependent processing occurred when the amplitude envelope of the precursive speech was filled with white noise, indicating that the precursive sounds must have some acoustic continuity with the test word for integration to take place. The results support the idea that context-dependent processing can be based on coarse aspects of the speech signal.

47 citations


Journal ArticleDOI
TL;DR: The results suggest that F1 frequency differences in these early portions of preceding vowels do, in fact, influence listeners' judgments of postvocalic consonantal voicing.
Abstract: Previous research has shown that F1 offset frequencies are generally lower for vowels preceding voiced consonants than for vowels preceding voiceless consonants. Furthermore, it has been shown that listeners use these differences in offset frequency in making judgments about final-consonant voicing. A recent production study [W. Summers, J. Acoust. Soc. Am. 82, 847-863 (1987)] reported that F1 frequency differences due to postvocalic voicing are not limited to the final transition or offset region of the preceding vowel. Vowels preceding voiced consonants showed lower F1 onset frequencies and lower F1 steady-state frequencies than vowels preceding voiceless consonants. The present study examined whether F1 frequency differences in the initial transition and steady-state regions of preceding vowels affect final-consonant voicing judgments in perception. The results suggest that F1 frequency differences in these early portions of preceding vowels do, in fact, influence listeners' judgments of postvocalic consonantal voicing.

46 citations


Journal ArticleDOI
TL;DR: Rather than relying solely on the temporal characteristics of the VOT interval, a matrix of acoustic cues may influence how a listener perceives word-initial voicing as produced by phonologically disordered children.
Abstract: Spectrographic measures of voice onset time (VOT) were made for phonologically disordered children in whom a voicing contrast was just beginning to emerge. These temporal measures were related to a...

46 citations


Journal ArticleDOI
TL;DR: The perception of phonologically significant speech pattern contrasts was measured in normally hearing subjects who were presented with Fo contours alone, speechreading alone, and the two in combination and the combined score was higher than either of the single-modality scores.
Abstract: The perception of phonologically significant speech pattern contrasts was measured in normally hearing subjects who were presented with F0 contours alone, speechreading alone, and the two in combination. For the suprasegmentals and final consonant voicing, perception in the combined condition was dominated by F0. For the vowel, consonant place, and final consonant continuance contrasts, perception in the combined condition was dominated by vision. For initial consonant voicing and continuance, however, there was clear evidence of interaction between F0 and speechreading. Here, the combined score was higher than either of the single-modality scores, and also higher than could be predicted on the assumption that the auditory and visual channels act as statistically independent channels of information.

44 citations


Book ChapterDOI
Jacqueline Vaissière1
01 Jan 1988
TL;DR: The use of prosodic parameters in automatic speech recognition (ASR) is concerns the feasibility of automatically extracting prosodic information from a set of acoustic measurements done on the signal, and the incidence of integrating such information on the performance of ASR.
Abstract: The present communication concerns the use of prosodic parameters in automatic speech recognition (ASR), i.e. the feasibility of automatically extracting prosodic information from a set of acoustic measurements done on the signal, and the incidence of integrating such information on the performance of ASR. Prosodic parameters include pauses and contrasts in pitch, duration and intensity between successive segments (mainly the vocalic parts). This notion is also extended to number of syllables and to ratios of voiced to unvoiced portions of the words. Part one introduces the various aspects of prosody (linguistic and non linguistic) and the main problems to be solved in automatically extracting linguistic messages conveyed by prosodic features. Part two deals with word level and lexical search: it presents work done (1) on the feasibility of word stress detection (primary stress, estimation of its magnitude, and evaluation of the complete word stress pattern) and (2) on the estimation of the amount of lexical constraints imposed by stress information in lexical search, completed by other suprasegmental information (number of syllables, word boundaries, ratios between voiced and unvoiced portion in the word, etc.). Part three deals with phrase and sentence levels and syntactic constraints provided by the automatic detection of word, phrase and sentence boundaries. Part four relates a number of miscellaneous uses at the phonemic level: phonetic segmentation, identification of the voicing feature of consonants, and estimation of the “segmental quality” of the underlying segments.

36 citations


Journal ArticleDOI
TL;DR: The addition of tactile input to speechreading provided better performance than that obtained by speechreading atone and the multichannel display was found to be significantly more effective than the single-channel for perception of pitch rise/fall only.
Abstract: The perception of initial consonant voicing, final consonant voicing, pitch change, and word stress, was measured in six normal subjects, by speechreading alone, by tactile transmission of fundamental voice frequency alone, and by the two in combination. Two tactile displays were used: a single-channel (temporal) display and a 16-channel (spatial) display. By speechreading alone, all contrasts except initial consonant voicing were partially perceptible. By both tactile aids alone, all four contrasts were partially perceptable. The addition of tactile input to speechreading provided better performance than that obtained by speechreading alone. The multichannel display was found to be significantly more effective than the single-channel for perception of pitch rise/fall only.

29 citations


Journal ArticleDOI
TL;DR: Findings extend previous results on rate-dependent processing of overall speaking rate to the processing of local speaking rate and provide further evidence of the importance of extended phonetic context in speech recognition.
Abstract: An examination of the effect of phrase‐final lengthening on the temporal correlates of voicing in syllable‐final /s/ and /z/ was conducted. Discriminant analyses revealed that a combination of vowel duration, frication duration, and the duration of simultaneous voicing and frication was quite successful in determining voicing independently of phrase‐final lengthening. Two perceptual experiments revealed that human listeners’ recognition of the segments does benefit from hearing the syllables in sentential context as opposed to when they are excised from context and presented in isolation. The benefit was greatest for /s/ in phrase‐final position and /z/ in phrase‐internal position. This suggests that the presence of sentential context allows listeners to factor out the influence of phrase‐final lengthening on vowel duration and to more accurately interpret this cue to voicing of the final fricative. These findings extend previous results on rate‐dependent processing of overall speaking rate to the process...

10 citations


Journal ArticleDOI
TL;DR: In this article, an acoustic investigation was made of the pattern of durations of individual phonetic segments in Telugu using sound-spectrograph and electro-kymograph, which revealed that each segment has its own inherent duration depending on its type, position in an utterance, surrounding phonetic context and the number of sounds/syllables.
Abstract: An acoustic investigation is made of the pattern of durations of individual phonetic segments in Telugu using sound-spectrograph and electro-kymograph. The discussion of phonetic variation in duration includes the following factors: (i) Intrinsic duration of segments of both vowels and consonants and its correlation with (a) vowel height, (b) consonant place, (c) consonant manner and (d) voicing; (ii) Influence of neighboring sounds on long vs short vowel and/or consonant; (iii) The relevance of position in an utterance; (iv) The behavior of consonants in sequences; and (v) The influence of number of syllables and segments in a word. Physical measurements reveal that each segment has its own inherent duration depending on its type, position in an utterance, surrounding phonetic context and the number of sounds/syllables.

Journal ArticleDOI
TL;DR: The effects of two different forms of verbal feedback on speech production were studied in 7 dysarthric speakers and the use of specific feedback to induce articulatory change during speech treatment is discussed.
Abstract: The effects of two different forms of verbal feedback on speech production were studied in 7 dysarthric speakers. Both forms of verbal feedback signaled that the listener failed to understand the message. The more general form of feedback gave no specific cues regarding the reason the listener failed to understand. The more specific feedback indicated that a voiceless initial consonant was perceived as its voiced cognate. The subjects studied had inconsistent voicing errors. Voice onset times (VOTs) and syllabic intensity, duration, and rate were measured in the phrases produced prior to and after verbal feedback. The results showed a significant change in VOT after the specific feedback and no significant change in VOT after the more general feedback. The use of specific feedback to induce articulatory change during speech treatment is discussed.

Journal ArticleDOI
TL;DR: In this article, an acoustic analysis was conducted to test the hypothesis that information signaling voicelessness preceding aspirated consonants may reside in spectral characteristics associated with "breathy voice" at the onset of the following vowel.
Abstract: Acoustic analysis was undertaken to test the hypothesis that information signaling voicelessness era preceding aspirated consonant may reside in spectral characteristics associated with “breathy voice” at the onset of the following vowel. The study focused upon the enhancement of the first harmonic (H1) resulting from an increase in the open quotient of the voicing source waveform. The relative amplitudes of H1 and H2 were measured in vowels in CVd syllables pairing the consonants /p,t,k,h,b,d,g/ with the vowels /i,ɑ,u,e,Λ/. The syllables were spoken by three male native speakers of English. Harmonic amplitudes were measured from DFT spectra (computed without preemphasis) of the first few pitch periods of the vowel. In general, the amplitude of H1 was greater than that of H2 for vowels following voiceless consonants; the converse was true for voiced consonants. A perceptual study using synthetic continua (varying in VOT) is being conducted; the continua differ in relative amplitude of H1 and H2. Preliminary results indicate that for some subjects an enhanced H1 may contribute to a voiceless percept: The boundary between voiced and voiceless stops was shifted to smaller VOT values for stimuli in which the amplitude of H1 was greater than that of H2 at vowel onset. [Work supported by NIH.]

Journal ArticleDOI
TL;DR: The perception and production of voice onset time (VOT) was investigated in a Thai patient with an adventitious, profound sensorineural heating loss and the results suggested only minor articulatory perturbations.
Abstract: The perception and production of voice onset time (VOT) was investigated in a Thai patient with an adventitious, profound sensorineural heating loss. Thai exhibits a three-category voicing distinction for bilabial (/b, p, ph/) and alveolar (/d, t, th/) stops, and a two-category distinction for velar (/k, kh/) stops. VOT perception was measured in labeling responses to synthetic speech continua differing in VOT; VOT production was measured in word-initial stops of words produced in isolation. These measurements were compared with previously published VOT data for normal-hearing Thai speakers. The results of acoustic analyses of this subject's productions suggested only minor articulatory perturbations, and the target phonemes were generally identified accurately by normal listeners.

Dissertation
01 Jan 1988
TL;DR: In this paper, the effect of the duration of the closure/stricture phase on the perception of Dutch two-obstruent sequences (C1C2) was investigated.
Abstract: The perception of voicing in Dutch two-obstruent sequences (C1C2) was studied as a function of the durations of the two “consonants”, i.e. the duration of the closure/stricture phase. In a second experiment the effect of “preceding vowel” duration, that is the duration of the “vowel” resonance, was investigated. Both experiments employed synthetic non-word stimuli of the type VCCV, in which both “consonants” were synthesized without periodicity. The results indicate that the duration of the first “consonant” (C1) does not affect perception. Longer “C2”-durations led to more [-voice] percepts for C2, resulting in a shift in the responses from voiced–voiced and voiceless–voiced to voiced–voiceless and voiceless–voiceless. “Preceding vowel” duration affected the perception of C1, not that of C2. Longer “vowel” durations gave rise to more [+ voice] percepts for C1, thus causing a shift in the responses from voiceless–voiceless and voiceless–voiced to voiced–voiceless and voiced–voiced.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the perception or voicing in Dutch two-obstruent sequences (C1C2) is, among others, affected by voice onset time, voice termination time (the period between oral closure and voice offset), duration of preceding vowel resonance, and constriction duration of the second consonant in the sequence.

Journal ArticleDOI
TL;DR: This article found that morphophonological processes in the acquisition of Luo plurals and possessives present different degrees of difficulty for the children, with the most difficult being phone substitution, i.e. a change in mode and place of articulation.

Journal ArticleDOI
TL;DR: This paper found that low frequency periodicity can usually be found at VC and CV boundaries, specifically in the initial 30 ms or the final 20 ms of a phonetically "voiced" fricative.
Abstract: Measurements of voicing have shown that low‐frequency periodicity can usually be found at VC and CV boundaries, specifically in the initial 30 ms or the final 20 ms of a phonetically “voiced” fricative. The purpose of this study is to determine the perceptual importance of voicing at VC and CV boundaries for fricatives. The VCV tokens with alveolar fricatives were synthesized with various fricative durations and various patterns of voicing within the consonant. Listeners identified the consonant as either [s] or [z]. Short fricatives are heard as [z] but, for durations longer than 70 ms, voicing must occur at either the VC or CV boundary. Voicing placed in the middle of the fricative did not elicit [z] responses. When voicing occurred at the boundaries, perception was influenced by fricative duration, and by amplitude and duration of voicing. Possible theoretical interpretations of the data will be discussed. [Work supported by grants from NINCDS to Brown University and to MIT.]

Journal ArticleDOI
TL;DR: In this article, the authors show that reaction times for place classification in a condition in which stimuli vary along both place and voicing (the orthogonal condition) are longer than RTs in a control condition where stimuli vary only in place.
Abstract: Previous studies using speeded classification paradigms have been used to test whether the information that underlies the perception of place and voicing is processed in an integral as opposed to a separate fashion. Results of auditory only experiments show that reaction times (RTs) for place classification in a condition in which stimuli vary along both place and voicing (the orthogonal condition) are longer than RTs in a control condition in which stimuli vary only in place. These data are taken in support of the idea that the information underlying place and voicing axe integrally processed. This study was concerned with whether or not place classification would demonstrate the same pattern of results when the place feature was determined by both auditory and visual (the face of the talker) information. To do this, the auditory tokens /ibi/ and /ipi/ were paired with a video display of a talker saying /ibi/ or /igi/, and presented to subjects for speeded classification of place or voicing. Because of t...

Journal ArticleDOI
TL;DR: This article found that F0 contributes to the voicing distinction, even when the categorization is not changed, and F0 cues a voiced response incrementally as it starts below the F0 of the remainder of the syllable.
Abstract: Earlier work [A S Abramson and L Lisker, in Phonetic Linguistics (1985)] demonstrated that falling fundamental frequency (F0) after a syllable‐initial stop was a cue to voicelessness, and that flat or rising F0 was a cue for voiced stops, but only when the voice onset time (VOT) was ambiguous The present study replicated that finding with seven VOT values and five onset F0 values In the first condition, subjects identified the stop as “b” or “p” Results were nearly identical to the previous experiment A second condition included not just the stop decision, but a reaction time as well Here, inappropriate F0 slowed response time even for unambiguous VOTs A final condition was, like the first, identification without time pressure Here, it was found that subjects were distinguishing all five levels of F0 onset so that, the lower the onset was, the more “b” responses were obtained in the ambiguous region Thus F0 contributes to the voicing distinction, even when the categorization is not changed Also, F0 cues a “voiced” response incrementally as it starts below the F0 of the remainder of the syllable [Work supported by NIH Grant No HD‐01994]

Journal ArticleDOI
TL;DR: In this paper, the authors report detailed measurements of voice source properties during transitions between consonants and vowels, including oral air flow, oral air pressure, and transillumination from two subjects producing reiterant speech with the vowel /ae/ and different voiced and voiceless consonants.
Abstract: Studies of the voice source during speech have been concerned mostly with variations in fundamental frequency and amplitude. However, the source variations also comprise the spectral characteristics and the harmonics‐to‐noise ratio. The present study reports detailed measurements of voice source properties during transitions between consonants and vowels. Oral air flow, oral air pressure, and transillumination were recorded from two subjects producing reiterant speech with the vowel /ae/ and different voiced and voiceless consonants; the flow signal was inverse‐filtered to obtain an estimate of the glottal pulse. Results indicate a breathy phonation at the release of voiceless consonants as indicated by an open quotient close to 1. Peak flow during each glottal pulse is thus high at voicing onset following a voiceless consonant and decreases, with an undershoot, during the first 20 pitch periods. The source pulse is also skewed to the left during the first pitch periods following voiceless consonants. Sour...


Journal ArticleDOI
TL;DR: This paper measured European French and Canadian English labial stops and found that the French /p/ and English /b/ categories are very similar, often exhibiting only small differences in voice onset time.
Abstract: Measurements of European French and Canadian English labial stops indicate that the French /p/ and English /b/ categories are very similar, often exhibiting only small differences in voice onset time. Two groups of English‐speaking listeners (one with some knowledge of French, the other without) were asked to categorize a set of modified natural tokens from these categories as either /p/ or /b/. The tokens were taken from word‐initial stops produced in a sentence context. Components preceding the release bursts were removed, and the signals were truncated at 68 ms. Despite the small difference in VOT between the categories, most listeners were able to reliably separate them at levels above chance. Analysis revealed that the listeners may have relied primarily on VOT in making their judgments and that overall amplitude [C. J. Darwin and M. Pearson, Speech Commun. 1, 29–44 (1982)] may have played a secondary role. These findings indicate that listeners may be sensitive to small differences between categories in their native language and analogous categories in a foreign language.

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The authors present a method for determining the effect of acoustic coupling of the vocal tract with the glottal and subglottal sections of the voice-production system, and also an algorithm which incorporates the acoustic coupling in vowel production and in vowel-vowel coarticulation in a cascaded formant synthesizer in real time.
Abstract: Summary form only given, as follows The quality of vowel sounds in synthetic speech produced by commercially available formant synthesizers very often lacks the desired naturalness One reason is that most systems are implemented without including the effect of acoustic coupling of the vocal tract with the glottal and subglottal sections of the voice-production system The authors present a method for determining the effect of this interaction on the shape of the glottal volume velocity signal, and also an algorithm which incorporates the acoustic coupling in vowel production and in vowel-vowel coarticulation in a cascaded formant synthesizer in real time >

Proceedings ArticleDOI
24 Jun 1988
TL;DR: A method for reducing the effect of interference on the spectrum of voiced speech sounds is described, which divides the spectrum into a number of frequency regions and uses the strength of voicing in each region as a measure of signal strength in that region.
Abstract: A method for reducing the effect of interference on the spectrum of voiced speech sounds is described. The method divides the spectrum into a number of frequency regions and then uses the strength of voicing in each region as a measure of signal strength in that region. This approach is particularly suited to cases where the interference is made up of one or more sinusoidal waves. The output from the proposed system was also used to control a speech synthesizer. Informal listening tests showed that the speech produced was understandable, although it sounded metallic and machine-like due to the nature of the excitation signal that was used. >

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The author's compare this voicing encoding strategy to a novel one which uses an extra pulse per period when voicing is present in the input signal, results were encouraging: one subject achieved 100% discrimination with the new strategy (after very limited training), compared to 85% obtained using the old strategy.
Abstract: Voicing is the feature that indicates whether a speech sound is quasiperiodic or aperiodic. It is used perceptually to discriminate pairs of sound such as /s,z/,/p,b/,f,v/, etc. The nucleus WSP-III multichannel speech processor uses a stimulation rate equal to the fundamental frequency of the input speech signal: two pulses are sent in rapid sequence during each fundamental period. When speech is unvoiced and a fundamental frequency cannot be determined, a random stimulation rate of approximately 100 Hz is used. Therefore the processor uses the stimulation rate to encode voicing: unvoiced sounds are delivered using a random rate while voiced sounds are delivered using a more stable rate. The author's compare this voicing encoding strategy to a novel one which uses an extra pulse per period when voicing is present in the input signal. Results were encouraging: one subject achieved 100% discrimination with the new strategy (after very limited training), compared to 85% obtained using the old strategy. >