scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1978"


Journal ArticleDOI
TL;DR: A series of experiments examined listeners' ability to detect mispronounced words in a short story and showed that prestressed work-initial stop consonants are more perceptible than other consonants.
Abstract: A series of experiments examined listeners’ ability to detect mispronounced words in a short story. Mispronunciations were produced by changing a single consonant segment in a word to produce a (phonologically permissible) nonsense word. The results of six different experiments showed that prestressed word‐initial stop consonants are more perceptible than other consonants. For example, mispronunciations produced by changing the voicing of a word‐initial stop (e.g., ’’boy’’ to ’’poy’’) were detected about 70% of the time, while changes in voicing of a word‐initial fricative (e.g., ’’voice’’ to ’’foice’’) were detected about 38% of the time. Mispronunciations produced by changing the place of articulation of a prestressed word‐initial stop were most detectable of all (80% to 90% detection) for three different speakers. A change in place of articulation of a word‐initial stop (e.g., ’’baby’’ to ’’daby’’) was detected as often as a change in both place of articulation and voicing (e.g., ’’baby to ’’taby’’). Finally, it was found that a mispronunciation was detected about twice as often in word‐initial than in word‐final position in one syllable words for both stops and nasals. The results suggest that listeners pay special attention to word‐initial stop consonants in natural continuous speech.

103 citations


Journal ArticleDOI
TL;DR: In this article, an excitation source model for speech compression and synthesis is presented that allows the degree of voicing to be varied continuously by mixing voiced and unvoiced excitations in a frequency-selective manner.
Abstract: This paper presents an excitation source model for speech compression and synthesis that allows the degree of voicing to be varied continuously by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency‐selective manner. The mix is achieved by dividing the speech spectrum into two regions, with the pulse source exciting the low‐frequency region and the noise source exciting the high‐frequency region. The degree of voicing is specified by a parameter Fc, which corresponds to the cut‐off frequency between the voiced and unvoiced regions. For speech compression applications, Fc can be extracted automatically from the speech spectrum and transmitted. Experiments performed with the new model indicate its power in synthesizing natural sounding voiced fricatives and in largely eliminating the ’’buzzy’’ quality of vocoded speech. A functional definition of buzziness and naturalness is given in terms of the model.

83 citations


Journal ArticleDOI
TL;DR: Investigation of voice onset time in stop production demonstrated that the VOTs of the apraxic subject differed markedly from those of normal subjects, yielding a compression of the two categories and a marked overlap.

82 citations


Journal ArticleDOI
TL;DR: Analysis of the data demonstrates that the tactile transform enables receivers to achieve excellent recognition of vowels in CVC context and the consonantal features of voicing and nasality, which leads to recognition performance in the combined condition (visual plus tactual) which far exceeds either reception condition in isolation.
Abstract: Four normal‐hearing young adults have been extensively trained in the use of a tactile speech‐transmission system. Subjects were tested in the recognition of various phonetic elements including vowels, and stop, nasal, and fricative consonants under three receiving conditions; visual reception alone (lipreading), tactile reception alone, and tactile plus visual reception. Subjects were artificially deafened using earplugs and white noise and all speech tokens were presented live voice. Analysis of the data demonstrates that the tactile transform enables receivers to achieve excellent recognition of vowels in CVC context and the consonantal features of voicing and nasality. This, in combination with high recognition of vowels and the consonantal feature place of articulation through visual reception, leads to recognition performance in the combined condition (visual plus tactual) which far exceeds either reception condition in isolation.

73 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that the glottis is at least partially open at each position of articulation, but it is not established how much of this opening is cause and how much effect.

72 citations



Proceedings ArticleDOI
10 Apr 1978
TL;DR: An excitation source model for speech compression and synthesis is presented, which allows for a degree of voicing by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency-selective manner.
Abstract: This paper presents an excitation source model for speech compression and synthesis, which allows for a degree of voicing by mixing voiced (pulse) and unvoiced (noise) excitations in a frequency-selective manner. The mix is achieved by dividing the speech spectrum into two regions, with the pulse source exciting the low-frequency region and the noise source exciting the high-frequency region. A parameter F c determines the degree of voicing by specifying the cut-off frequency between the voiced and unvoiced regions. For speech compression applications, F c can be extracted automatically from the speech spectrum and transmitted. Experiments using the new model indicate its power in synthesizing natural sounding voiced fricatives, and in largely eliminating the "buzzy" quality of vocoded speech. A functional definition of buzziness and naturalness is given in terms of the model.

58 citations


Journal ArticleDOI
TL;DR: The authors investigated the relation between the acoustic characteristics of final stop syllables and the perception of the voicing distinction and found that the formant transitions, closure, burst, and vowel duration are important in determining whether a stimulus is heard as voiced or voiceless.

57 citations


Journal ArticleDOI
TL;DR: The location of the voicing boundary in the perception of initial stop consonants is shown to vary according to the range of voice-onset times used in a block of trials, and may provide a metric for assessing the auditory tolerance of phonological categories.
Abstract: The location of the voicing boundary in the perception of initial stop consonants is shown to vary according to the range of voice‐onset times used in a block of trials and according to the order in which blocks covering different ranges are presented. Although these range effects introduce methodological complications into the interpretation of adaptation experiments, they appear to be qualitatively different from adaptation effects and, it is suggested, may provide a metric for assessing the auditory tolerance of phonological categories.

49 citations


Journal ArticleDOI
TL;DR: The VOT measure has been said to provide the single most nearly adequate physical basis for separating homorganic stop categories across a variety of languages, granted that other features may also be involved.
Abstract: The VOT measure has been said to provide the single most nearly adequate physical basis for separating homorganic stop categories across a variety of languages, granted that other features may also be involved. That transition duration affects perceived voicing of synthesized initial stops of one specific language, English, has suggested the hypothesis by Stevens and Klatt (1974) that a detector responsive to rapid formant-frequency shifts after voice onset better explains the child's acquisition of the contrast than does some mechanism which responds to VOT directly. If such a detector is part of our biological equipment, then it seems remarkably underutilized in language, for the hypothesis asserts that basic to voicing perception is whether laryngcal signal is or is not present during the interval in which the stop-vowel transition occurs. In effect, the “archetypical” voiceless stop is aspirated. Not only do many languages not possess voiceless aspirates, but even in English aspiration is severely res...

38 citations


Journal ArticleDOI
TL;DR: The range and reliability of the laterality effects obtained, as well as certain other methodological features, make the present tests promising as tools for assessing individual differences in ear dominance.

Journal ArticleDOI
TL;DR: It was found that there were apparent reciprocal patterns in the posterior cricoarytenoid (PCA) and the interarytenoids (INT) in terms of significant negative correlation, and active control of PCA for voicelessness was demonstrated.
Abstract: The aim of the present study was to investigate the laryngeal adjustments for voiced versus voiceless distinction in Japanese consonant production by means of laryngeal electromyography (EMG) and fiberoptic observation. Multichannel EMG recordings were taken of a Japanese subject and the data were computer-processed to obtain the averaged activity patterns of the five intrinsic laryngeal muscles with special reference to the voicing distinction in consonant production in various phonetic environments. It was found that there were apparent reciprocal patterns in the posterior cricoarytenoid (PCA) and the interarytenoid (INT) in terms of significant negative correlation, and active control of PCA for voicelessness was demonstrated. The patterns of the thyroarytenoid and the lateral cricoarytenoid were different from that of INT even though these two muscles are usually classified as the members of the adductor group, and their activity levels were apparently influenced by the phonetic environment. A possible contribution of the cricothyroid (CT) to the voicing distinction was also pointed out but further investigations on acoustic parameters seem to be mandatory in more critical interpretation of CT activity in speech.

Journal ArticleDOI
01 May 1978-Lingua
TL;DR: In this paper, the authors elucidate some of the principles governing cross-linguistic variation in such phonological processes as Terminal Devoicing and Intervocalic Voicing and show that the theory of atomic phonology provides a correct characterization of these processes and their associated constraints.

Journal ArticleDOI
TL;DR: In this article, an experimental investigation of a hypothesized perceptual cue in the determination of the voicing status of postvocalic English stop consonants was conducted, and no empirical support for Parker's hypothesis was found.

Proceedings ArticleDOI
10 Apr 1978
TL;DR: A real time phonetic voice synthesizer roughly the size of a small hi-fi amplifier has been developed that accepts a string of phoneme commands, each consisting of 8 bits, and simulates the transfer function of the human vocal tract.
Abstract: A real time phonetic voice synthesizer roughly the size of a small hi-fi amplifier has been developed. It accepts a string of phoneme commands, each consisting of 8 bits. 6 bits determine the phoneme uttered while 2 bits determine the inflection associated with that phoneme. The synthesizer contains an active filter network which simulates the transfer function of the human vocal tract. This analog network is excited by both voicing and fricative sound sources. The sound sources and the vocal tract filter transfer function are dynamically manipulated in response to the numerous phoneme command sequences to produce articulatory synthesis by rule.

Journal ArticleDOI
TL;DR: The fuzzy logical model provided a good account for the data of this experiment and implies that place and voicing feature information are evaluated independently before being integrated during phoneme identification.

Journal ArticleDOI
TL;DR: Multidimensional scaling analyses of three types of English consonant confusions are reported: consonant substitutions in spontaneous speech errors, CV perceptural confusions, and VC perceptual confusions.
Abstract: Multidimensional scaling analyses of three types of English consonant confusions are reported: consonant substitutions in spontaneous speech errors, CV perceptual confusions, and VC perceptual confusions. Two data sets of each type are analyzed to assess reliability. Three reliable dimensions emerge in all data sets, corresponding to voicing, stop/fricative, and place of articulation. Representation of consonants in terms of categorical phonological features exhaustively describes what is common to the configurations of different data types, even though there is reliable detail within each data type that is not captured by categorical features. Such features can be viewed as groupings of speech sounds common to various perception and production processes.

Journal ArticleDOI
TL;DR: The analysis of Dutch voice assimilation presented in Hubers and Kooij (1973; hereafter, H&K) represents a considerable improvement over previous generative accounts as discussed by the authors.
Abstract: 0. The analysis of Dutch voice assimilation presented in Hubers and Kooij (1973; hereafter, H&K) represents a considerable improvement over previous generative accounts. 1 Especially the suggestion that two distinctive features, [± Vce] (voicing) and [± Tns] (tenseness), rather than just one, [± Vce], should be used in the analysis has enabled the authors to achieve a much more detailed phonetic description of this phenomenon than had previously been possible. However, in spite of these improvements, certain facts have remained unaccounted for in their presentation. In this paper I will show that, by altering slightly the underlying forms assumed and the phonological rules required, it is possible to account naturally for these other facts while retaining all of the basic advantages of H&K's approach. 2


Proceedings ArticleDOI
01 Apr 1978
TL;DR: For a number of popular voicing statistics (zero-crossing rate, spectral slope, and low-frequency energy), the voiclng decision is improved by use of context, in fact by using of just the previous segment.
Abstract: Voicing decisions in speech compression or recognition procedures are usually made in a context-free manner on successive fixed-length segments of speech. For a number of popular voicing statistics (zero-crossing rate, spectral slope, and low-frequency energy), the voiclng decision is improved by use of context, in fact by use of just the previous segment. For each statistic, instead of looking for a threshold that selects voiced segments, we use two thresholds, one if the last segment was called voiced and the other if the last segment was unvoiced. A typical improvement obtained by allowing this 'hysteresis' in the voicing decision is a 15 percent drop in error rate.

Journal ArticleDOI
TL;DR: A new class of tonal sounds can be generated by repeating brief sections of noise over and over without intervening silence when the repeated waveform is white noise, a "white tone" with a rich distinctive timbre and no noise-like quality is heard over a considerable range of repetition rates as discussed by the authors.
Abstract: A new class of tonal sounds can be generated by repeating brief sections of noise over and over without intervening silence When the repeated waveform is white noise, a “white tone” with a rich distinctive timbre and no noise-like quality is heard over a considerable range of repetition rates If the noise is a whispered vowel rather than white noise, repetition of a sample equal in duration to a single glottal pulse during voicing can generate a “whisper tone” sounding like a voiced version of the vowel Whispered discourse can be converted to an intelligible voiced monotone by repetition of regularly spaced samples drawn from the whis- pered speech

Journal ArticleDOI
TL;DR: This article showed that speakers who rarely if ever produce longlag stops themselves place their category boundary between the short and longlag regions, thus showing a dissociation between production and perception, and further work will vary the test conditions and/or subjects' response categories to determine if a boundary exists between the prevoiced and shortlag regions of the continuum.
Abstract: Polish is traditionally described as using the prevoiced and short‐lag categories to contrast its voiced and voiceless stops. However, Moslin and Keating [J. Acoust. Soc. Am. 62, S27 (A) (1977)] have shown that some speakers of Polish make use of the long‐lag, aspirated voicing category. Preliminary results for six speakers of Polish on a/da/‐/ta/continuum with VOT from −20 to +80 ms indicate that all speakers, regardless of how they produce their apical stops, show a labeling boundary and discrimination peak at about 35 ms. That is, speakers who rarely if ever produce long‐lag stops themselves place their category boundary between the short‐ and long‐lag regions, thus showing a dissociation between production and perception. Further work will vary the test conditions and/or subjects' response categories to determine if a boundary exists between the prevoiced and short‐lag regions of the continuum.

Journal ArticleDOI
TL;DR: In this article, the authors present a broad transcription of the Treger dialect, where stressed vowels are half long unless before fortis (voiceless, double) consonants or consonant clusters, where they are short.
Abstract: Fairly broad transcription. Stress is strong. Unstressed vowels are short, stressed vowels are half long unless before fortis (voiceless, double) consonants or consonant clusters, where they are short. Adjacent vowels are in hiatus and thus form two syllables, w, j are consonantal except when final or before a consonant where they represent the second element of falling closing diphthongs. , a = a in W. Treger, a in E. Treger. Contingent nasality before nasal consonants is not marked, e, a are e, a reduced towards ə except in the slowest, clearest forms of speech, θ = rounded ə. Lenis obstruent devoicing in final pausal position and in sandhi is marked .; fortis obstruent voicing in sandhi is marked ˅. h is a lenis, usually unvoiced, with some voicing possible between vowels and next to liquids; in final pausal position or in sandhi = x. m is a fortis. ɲ is a fortis; there is usually a j-glide between it and a preceding vowel, r is a light flap or trill; with some speakers it is ɻ; in some parts of Brittany it is R or B, but not in Treger; when written r it is not usually heard except in slow, clear forms of speech. I may be heard velarized in some districts, but not in Treger. t, d, n may be somewhat advanced towards a dental position, p, t, k may have slight aspiration except after s.

Journal ArticleDOI
TL;DR: The authors measured stop duration (VOT and closure) and intraoral air pressure for comparison of the production of glottalized stops with that of nonglottalised stops in K'ekchi.
Abstract: Stop duration (VOT and closure) and intraoral air pressure were measured for comparison of the production of glottalized stops with that of nonglottalized stops. The pitch and duration of the preceding and following vowels were also measured. Subjects read natural language minimal pairs in which glottalized and nonglottalized stops contrasted in word initial, medial, and final positions. The results establish preliminary acoustic and physiological variables by which glottalized stops in K'ekchi may be characterized and distinguished from such stops in other languages. These glottalized stops had a significantly greater VOT than their nonglottalized counterparts. However, /b′/ had two productions in free variation: (l) a voicing lead and (2) a zero VOT. Glottalized /t′/, /k′/, and /q′/ exhibited a greater positive air pressure than their nonglottalized counterparts. /b′/ exhibited a zero air pressure, thus demonstrating that the inventory of glottalized stops in K'ekchi consists of one bilabial implosive and three ejectives. The pitch of a vowel following a glottalized stop began at a lower frequency than that of a vowel following a nonglottalized stop except when the vowel was found before /q′/ and /q/. The pitch in these latter cases was the same.

Journal ArticleDOI
TL;DR: This paper showed that stop closure duration is a cue to the voicing of medial stops in English trochees, and they also found that /b/ closures are regularly shorter than /p/ closures in words such as rabid rapid, and that this difference has perceptual/phonetic significance.
Abstract: The evidence that closure duration is a cue to the voicing of medial stops in English trochees is as convincing as any we have for other acoustic features considered to be factors governing the linguistic interpretation of speech signals. Measurements of natural speech show /b/ closures to be regularly shorter than /p/ closures in words such as rabid rapid, and there are experimental data to indicate that this difference has perceptual/phonetic significance. Another closure feature, glottal pulsing, also plays a role in the /b/‐/p/ distinction in medial position. New data gathered to test the reliability of these two features as cues to the intelligibility of naturally produced tokens of rabid rapid indicate (1) stop closure duration does not suffice to separate /b/ from /p/ across speakers, (2) the phonetic effect of manipulating silent “closure” differs greatly for different tokens of the source word produced by a single speaker, and (3) the effect of replacing buzz with silence in natural tokens of rab...