scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1990"


Journal ArticleDOI
TL;DR: Perceptual validation of the relative importance of acoustic cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices.
Abstract: Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.

1,656 citations


Journal ArticleDOI
TL;DR: The results showed that theprocessing of a talker’s voice and the perception of voicing are asymmetrically dependent, and the processing of voice information appears to be qualitatively different from the encoding of segmental phonetic information, although they are not independent.
Abstract: Processing dependencies in speech perception between voice and phoneme were investigated using the Garner (1974) speeded classification procedure. Variability in the voice of the talker and in the cues to word-initial consonants were manipulated. The results showed that the processing of a talker’s voice and the perception of voicing are asymmetrically dependent. In addition, when stimulus variability was increased in each dimension, the amount of orthogonal interference obtained for each dimension became significantly larger. The processing asymmetry between voice and phoneme was interpreted in terms of a parallel-contingent relationship of talker normalization processes to auditory-to-phonetic coding processes. The processing of voice information appears to be qualitatively different from the encoding of segmental phonetic information, although they are not independent. Implications of these results for current theories of speech perception are discussed.

278 citations


Journal ArticleDOI
TL;DR: The speech of 4 phonologically disordered children with place and voicing errors affecting initial stop consonants was described through phonological and acoustic analyses and sources of speech sound errors were hypothesized by comparing the children's underlying representations determined from both acoustic and descriptive phonological data.
Abstract: The speech of 4 phonologically disordered children with place and voicing errors affecting initial stop consonants was described through phonological and acoustic analyses. Productions of target voiced and voiceless alveolar and velar stops were transcribed and acoustically analyzed before and after treatment that was administered on a predetermined contrast. Three of the children produced significant, although largely imperceptible, differences in VOT for a given stop when it represented different adult stops. The presence of productive phonological knowledge, as inferred from acoustic data, facilitated rapid generalization of correct production of the treated contrast. In the absence of acoustically determined productive knowledge, a longer treatment period was necessary to achieve a lower level of production accuracy on the same treated contrast. Sources of speech sound errors for the 4 children were hypothesized by comparing the children's underlying representations determined from both acoustic and d...

82 citations


Dissertation
01 Jan 1990
TL;DR: This thesis examines some of the cross-language differences and similarities of voicing contrasts of stop consonants in six Asian languages and explores ways of explaining their phonetic characteristics.
Abstract: This thesis examines some of the cross-language differences and similarities of voicing contrasts of stop consonants in six Asian languages and explores ways of explaining their phonetic characteristics. The languages investigated are Japanese, Mandarin Chinese, Korean, Burmese, Thai, and Hindi, and the examination is mainly based on the acoustic analysis of initial stops in these languages. The thesis consists of ten chapters. Chapter 1 deals with the background for the study, the scope of the study, and general experimental procedure. Some of the articulatory, acoustic, and perceptual characteristics of stop consonants are examined. Chapters 2 to 7 present the results of acoustic analysis in each language. In each chapter, the general properties of stop consonants and linguistic materials analyzed are presented, and acoustic characteristics such as voice onset time (VOT), fundamental frequency (Fo) and the curve, spectral properties, and onset frequency of the first formant (F1) are examined in detail. Chapter 8 examines cross-language characteristics based on the acoustic analysis. It can be said that languages in the present study use several acoustic features for distinguishing voicing categories in different ways, and "same" or similar sounds in these languages show some language-specific properties as well as features which are common to many languages. VOT functions for distinguishing voicing categories of stops if they are based on the timing events of glottal and supralaryngeal movements, and if other laryngeal features are involved, other acoustic dimensions are necessary for making a distinction. Furthermore, cross-language characteristics in acoustic dimensions such as Fo and the curve, spectral properties and the F1 onset frequency are examined. Chapter 9 examines some theoretical issues in cross-language phonetics. A model was proposed in which cross-language differences can be expressed as differences in phonological features. Some generalized phonetic patterns which are shared by these languages are presented, and the underlying articulatory mechanisms and the implications

49 citations


Journal ArticleDOI
TL;DR: It is suggested that it is the termination value of the F1 offset transition rather than rate and/or duration of frequency change, which cues voicing in final velar stop consonants during the transition period preceding closure.
Abstract: The perception of voicing in final velar stop consonants was investigated by systematically varying vowel duration, change in offset frequency of the final first formant (F1) transition, and rate of frequency change in the final F1 transition for several vowel contexts. Consonant–vowel–consonant (CVC) continua were synthesized for each of three vowels, [i,I,ae], which represent a range of relatively low to relatively high‐F1 steady‐state values. Subjects responded to the stimuli under both an open‐ and closed‐response condition. Results of the study show that both vowel duration and F1 offset properties influence perception of final consonant voicing, with the salience of the F1 offset property higher for vowels with high‐F1 steady‐state frequencies than low‐F1 steady‐state frequencies, and the opposite occurring for the vowel duration property. When F1 onset and offset frequencies were controlled, rate of the F1 transition change had inconsistent and minimal effects on perception of final consonant voicing. Thus the findings suggest that it is the termination value of the F1 offset transition rather than rate and/or duration of frequency change, which cues voicing in final velar stop consonants during the transition period preceding closure.

39 citations


Journal ArticleDOI
TL;DR: In all three experiments, the F₀ onset values contributed to the voicing judgment whether they were above or below the putative intonation contour, which argues for a noncategorical contribution of intonations.
Abstract: The post-stop-release rise or fall of fundamental frequency (F0) is known to affect voicing judgments of syllables with ambiguous voice onset times (VOTs). In 1986, Silverman claimed that the critical factor was not direction of F0 change but rather its direction relative to the intonational contour. He further claimed that only F0s that start above and fall to the contour have an effect proportional to the size of the frequency change; F0s that rise to the contour by different amounts were claimed to be equivalent. In our first experiment, we examined the effect on voicing judgments of five onset F0s preceding a single, flat contour. Only falling F0s were differentiated in the first set of judgments, but after increased exposure to the syllables, even F0s below the contour differentially affected the voicing judgment. In a second experiment, the contour of the final part of the syllable was flat, rising or falling. F0 contour affected the judgments, as did onset F0s, but the two factors did not interact, indicating that the onset values were not being judged by reference to the contours. However, the contour which was predicted to result in more voiceless judgments also ended at a higher F0 in the vowel, and another effect of voicing is that the F0 is higher throughout the vowel after voiceless stops. In a third experiment, F0 contours were created to contrast contour and mean F0. The effect of the F0 during the vocalic segment appeared to be attributable to the average F0 rather than the contour. In all three experiments, the F0 onset values contributed to the voicing judgment whether they were above or below the putative intonation contour. The contribution of the lower F0s, while significant, was not as great as that of the higher F0s, which argues for a noncategorical contribution of intonation.

36 citations


Journal ArticleDOI
TL;DR: A therapy technique is described that emphasizes "pushing harder" on voiceless consonants to improve the intelligibility of alaryngeal speakers.

19 citations


Journal ArticleDOI
TL;DR: The results support the predictions of an interactive-activation model, combining both top-down and bottom-up factors, and suggest that visual information appears to reduce the bias to report an item as intact.
Abstract: Phonemic restoration was studied using a version of Samuel's (1981a) psychophysical paradigm We examined the influence of specific acoustic correlates of voicing and place of articulation on phonemic restoration (d') and response bias (Beta) The influence of a higher-level, phonotactic constraint was also examined All of the stimuli were presented in both auditory-only and auditory-visual conditions, allowing the investigation of potential benefits of vision on phonemic restoration The results support the predictions of an interactive-activation model, combining both top-down and bottom-up factors As predicted, voicing and place of articulation significantly affected d': Voiceless stop consonants received greater restoration than voiced stops, and alveolar stops were less restorable than bilabial and velar stops The phonotactic, top-down constraint affected neither d' nor Beta Visual information, however, appeared to reduce the bias to report an item as intact

19 citations


Journal ArticleDOI
TL;DR: Multidimensional scaling analysis of dissimilarity ratings indicated that listening experience leads to increased perceptual differentiation of phonetic categories drawn from a language unfamiliar to the listener, supporting the position that speech perception mechanisms at the feature level may be distributed asymmetrically across the hemispheres.

10 citations


Journal ArticleDOI
TL;DR: Amplitude measurements of 32 whispered tokens of /p/ and /b/ produced in four vowel contexts revealed that whispered / b/ tended to have a steeper rise slope relative to the following vowel than did /p/.
Abstract: Amplitude measurements of 32 whispered tokens of /p/ and /b/ produced in four vowel contexts revealed that whispered /b/ tended to have a steeper rise slope relative to the following vowel than did /p/. In a perceptual experiment, short (60-ms) whispered stop + vowel utterances were played to 8 listeners. Their identification scores were significantly above chance, but lower than what might be expected for stop + vowel stimuli produced with normal voicing. The pattern of identifications showed no relationship to the measured rise time differences.

8 citations


Journal ArticleDOI
TL;DR: In this article, the locus of lexical frequency influences in auditory word recognition was investigated using a speech identification paradigm, where subjects were required to identify the initial phoneme of tokens from voicing speech continua.
Abstract: The locus of lexical frequency influences in auditory word recognition was investigated using a speech identification paradigm. Subjects were required to identify the initial phoneme of tokens from voicing speech continua. Here 44 continua were constructed such that the voiced and voiceless endpoint of each pair contrasted in lexical frequency. Identification responses for ambiguous stimuli tended to be the higher‐frequency member. Reaction times for unambiguous stimuli showed an advantage for the high‐frequency responses; ambiguous stimuli showed no such reaction time advantage. The patterns of reaction time for word frequency were similar to influences of monetary payoff [Connine and Clifton, J. Exp. Psychol.: Hum. Percept. Perform. 13, 219–299 (1987)] and support a post‐perceptual influence of word frequency. [Work supported by NIH.]

Book ChapterDOI
TL;DR: It is found that if facilitation in the perceptual-motor priming task is because of a shared value on a perceptually salient dimension, then one would expect that stimulus–response pairs with a share manner feature would produce faster reaction times than pairs without a shared manner feature.
Abstract: Publisher Summary This chapter discusses the different aspects of the perceptual-motor processing in speech. Natural language consists of patterns at many levels of analysis that can be analyzed without distinguishing between language perception and production. In comparison to voicing, the acoustic and articulatory correlates of place of articulation have no such simple shared characteristics that might be processed by a common mechanism. The acoustic cues to place are spectral in nature, consisting of the direction of formant transitions and the spectral pattern of the initial burst. The articulatory correlates of place of articulation consist of the location in the vocal tract where the closure, or constriction, is made. The perceptual-motor priming task reveals the speed with which the stimulus can be encoded and the response selected and produced. It is found that if facilitation in the perceptual-motor priming task is because of a shared value on a perceptually salient dimension, then one would expect that stimulus–response pairs with a shared manner feature would produce faster reaction times than pairs without a shared manner feature.

Journal ArticleDOI
TL;DR: In this article, an acoustical study of the post-vocalic obstruents showed that, although some of the consonants are perceived as voiced, all these consonants were produced without periodic vibrations.

Journal ArticleDOI
TL;DR: The main argument of Hoard's "Obstruent Voicing in Gitksan: Some Implications for Distinctive Feature Theory" (hereafter referred to as OVG), centers on the voicing of non-continuant obstruents as mentioned in this paper.
Abstract: 1. The main argument of Hoard (1978), \"Obstruent Voicing in Gitksan: Some Implications for Distinctive Feature Theory\" (hereafter referred to as OVG), centers on the voicing of noncontinuant obstruents in Gitksan. Hoard observes that Gitksan has a phonological rule that voices plain noncontinuant obstruents. He claims that this rule also applies to certain of the glottalized noncontinuant obstruents, which occur in three allophonic types: voiceless nonejective preglottalized stops/ affricates occur finally; voiceless ejective glottalized stops/affricates occur as first members of clusters; and voiced implosive stops/affricates occur before [+sonorant] segments. Hoard also proposes a revision of the features in Chomsky and Halle (1968) that distinguish among sound types differing in airstream and larynx features, and he says that some of these involve cooccurrence restrictions. One such restriction is that voiced segments are also characterized by glottal constriction. Thus, it follows that as glottalized stops/affricates become voiced, they also redundantly have glottal constriction, and as the closed larynx moves rapidly downward, so they are implosive. Our paper is a critique of the substantive portion of OVG. It offers alternative formulations of the obstruent voicing and other phonological rules that are based on different articulatory phonetic observations and on consideration of a wider range of forms, distributions, and alternations. It also provides instrumental evidence that Gitksan does not have voiced implosive stops; rather, it has lax glottalized stops that display a creaky voice quality at the margin of the vowel in pretonic (and syllablefinal) environments.

Journal ArticleDOI
TL;DR: A process for continuously producing a chewing gum base comprises the steps of continuously adding a hard elastomers, a filler and lubricating agents into a continuous mixer, subjecting the elastomer, filler andubricating agents to a dispersive mixing operation followed by a distributive mixing operation and continuously discharging the resulting chewing Gum base from the mixer while the adding and mixing steps are in progress.

Patent
31 Jan 1990
TL;DR: In this paper, a character string from an input device 11 is inputted to the language processing means 3 of the voicing and phoneme symbol generating means 1 of the test voice converting device and processed by the language process means 3 according to a program stored in a main control part 13.
Abstract: PURPOSE:To voice texts of respective languages by a simple system constitution so that a user easily understands by performing conversion into voicing and phoneme symbols for a 2nd predetermined language separately by a character string voicing and phoneme symbol converting means for a 1st language and then composing a voice by a voice composing means. CONSTITUTION:A character string from an input device 11 is inputted to the language processing means 3 of the voicing and phoneme symbol generating means 1 of the test voice converting device and processed by the language processing means 3 according to a program stored in a main control part 13. This processing means 3 converts the character string into a voicing and phoneme symbol sequence for the 1st language written in a document storage part 14 and a voicing and phoneme symbol converting means 4 converts the voicing and phoneme symbol sequence converted by the means 3 into voicing and phoneme symbols for the 2nd language (English or German). Then the acoustic synthesizing means 22 composes a voice and outputs texts of many languages by the simple system constitution with the voice which is easily understood by the user.

Patent
05 Oct 1990
TL;DR: In this paper, the authors propose to realize a high recognition rate and to give a speaking person no feeling of uneasiness by informing the speaking person of a speech recognizable period by a voicing start indication means and inhibiting the speakingperson from voicing anything in a speech unrecognizable period such as a voicing guidance period.
Abstract: PURPOSE:To realize a high recognition rate and to give a speaking person no feeling of uneasiness by informing the speaking person of a speech recognizable period by a voicing start indication means and inhibiting the speaking person from voicing anything in a speech unrecognizable period such as a voicing guidance period. CONSTITUTION:The voicing start indication means 9 informs the speaking person of the speech recognizable period to inhibit the speaking person from voicing anything in the speech unrecognizable period such as the voicing guidance period of an automatic cup vending machine. Namely, the voicing start indication means 9 informs the speaking person of the speech recognizable period and a lamp, etc., is provided on the front surface of, for example, the automatic cup vending machine in a conspicuous place; and a voicing start indication signal is OFF in the speech unrecognizable period such as the voicing guidance period, pattern selection processing period, etc., of the automatic cup vending machine, but ON in the speech recognizable period. Therefore, the speaking person voices words when the voicing start indication signal is ON and then never voice any word by mistake in the speech unrecognizable period.


Patent
09 Jan 1990
TL;DR: In this article, the authors proposed to improve practical effect in word voicing training by retrieving a storage circuit, correcting the misextraction of a voiceless section at the end of voicing, and correcting the display when detecting no voice at the beginning of the voicing.
Abstract: PURPOSE:To improve practical effect in word voicing training by retrieving a storage circuit, correcting the misextraction of a voiceless section at the end of voicing, and correcting the display when detecting no voice at the end of the voicing. CONSTITUTION:A voice waveform detected by a microphone 1 is converted by a rectifying and integrating circuit 2 into a DC signal proportional to the level of a voice to extract a sound section. Vocal chord vibration detected by a vocal chord vibration sensor 4, on the other hand, is converted by a rectifying and integrating circuit 5 into a DC signal proportional to the level of the vocal chord vibrations and a threshold circuit 6 extracts a voiced section. The extracted sound section and a section which is closer to a sound section than a voiced section and not the voiced section are decided as a voiceless section and other sections are decided as a soundless section; and they are displayed on a display device 10 and also stored in a storage device 8. Then the storage circuit 8 which is decided as a soundless section is retrieved reversely at the end of voicing and the misextraction of the voiceless section at the end is corrected and displayed. Consequently, trouble in the training is eliminated.

Journal ArticleDOI
TL;DR: This paper found that if the voicing of word-final obstruents exerts little or no effect on preceding vowel duration in a language, then native speakers of that language will produce a far smaller vowel duration difference in isolated English words like beat and bead than donative speakers of English.
Abstract: Previous second‐language (L2) research has shown that if the voicing of word‐final obstruents exerts little or no effect on preceding vowel duration in a language, then native speakers of that language will produce a far smaller vowel duration difference in isolated English words like beat and bead than do native speakers of English. This might be due to phonetic interference, resulting from the perceptual identification of word‐final consonants in English and the L1. If so, one would expect native speakers of languages without word‐final obstruents to succeed better in acquiring the vowel duration cue to the word‐final English /t/ vs /d/ contrast. To test this, native speakers of English, Spanish, and Chinese read lists of minimally paired /bVt/ and /bVd/ words. Native English‐speaking listeners later identified far more final stops produced by native speakers of English than Spanish or Mandarin (95% vs 72%, 63%). When closure voicing and release burst cues were removed, the correct identification rates ...




Journal ArticleDOI
TL;DR: In this article, the authors investigate the domain of laryngeal features in the Chonnam dialect of Korean and find that the accentual phrase is the dominant unit rather than the word.
Abstract: This paper investigates the domain of two aspects of laryngeal features in the Chonnam dialect of Korean. In Korean, voiceless lenis stops, /p,t,k/, sometimes become voiced between voiced segments. Traditionally, this voicing has been discussed as occurring “within words.” However, word‐initial lenis stops are sometimes voiced at fast tempo, suggesting that the domain is some prosodic unit such as the “accentual phrase” [S. A. Jun, J. Acoust. Soc. Am. Suppl. 1 85, S98 (1989)]. To test this, utterances of various constructions produced by three Chonnam speakers at three different tempi (slow, normal, fast) were recorded. An electroglottograph (EGG) was recorded simultaneously with the audio wave. In general, the audio waveform and EGG agreed: Either both showed voicing, or neither did. Only a few cases showed a discrepancy and, for almost all of these cases, the EGG data showed what was expected from the accentual phrasing. Since the number of accentual phrases within an utterance varied with the speech rate (the faster, the fewer), voicing also varied. Preliminary results of VOT measurements in aspirated stops show a similar effect of the accentual phrase boundary: VOT was longer at the beginning of an accentual phrase than medially. Thus the domain of all laryngeal feature effects seems to be the accentual phrase rather than the word.

Patent
16 Mar 1990
TL;DR: In this paper, a speech synthesis system consists of a speech data part 1, a speech synthesizer part 2, a data processing part 3, a cause and effect data part 4, and an argument processing part 5.
Abstract: PURPOSE:To generate a synthetic voice as if a speaker voiced by detecting the motion of the muscle of the neck or the mouth of the speaker and its peripheral part or variation of breathing even if the speaker does not voice actually. CONSTITUTION:The speech synthesis system consists of a speech data part 1, a speech synthesis part 2, a data processing part 3, a cause and effect data part 4, and an argument processing part 5. The data part 1 is stored with indi vidual speech data generated previously from the voice of the speaker to obtain a voice which is as close to the actual speaker's voice as possible, so timbre is generated in consideration of the spectrum, the intonation and speed of voic ing. The synthesis part 2 synthesizes a speech corresponding to the individual timbre when the timbre is inputted from the data part 1. The processing part 3 samples an input analog signal to extract its features and generates a signal based upon physical variation caused by the voicing operation of the speaker. The data part 4 stores cause and effect data outputted by the processing part 3 in correspondence relation with the individual timbre data and the processing part 5 judges the data and supplies the result to the data part 1.

Journal ArticleDOI
TL;DR: In this article, the authors report on an articulatory investigation of acoustic findings that vowels tend to be longer before voiced than voiceless consonants, and that the duration of the closing gesture may be stretched, the consonantal gesture may begin later in the vowel, or the production of the entire vowel may be lengthened.
Abstract: This reports on an articulatory investigation of acoustic findings that vowels tend to be longer before voiced than voiceless consonants [House and Fairbanks, J. Acoust. Soc. Am. (1953)], before fricatives than stops [Umeda, J. Acoust. Soc. Am. (1975)], and before single consonants than before multiple consonants [Fowler, J. Exp. Psychol. (1983)]. Measurements were taken from tongue movement traces obtained with the Wisconsin microbeam system during vowels in accented and unaccented conditions. Assuming a gestural theory of speech production, three mechanisms for producing these duration differences are considered: The duration of the closing gesture may be stretched, the consonantal gesture may begin later in the vowel, or the production of the entire vowel may be lengthened. One subject's vowel durations were affected by the voicing and manner of the following consonant. Another's were affected only by consonant voicing when the word is accented, and only by consonant duration when the word is unaccented. Moreover, the effect of duration and voicing on the vowel duration was due, in part, to differences in how early the closing gesture for the consonant begins, while the effect of manner of articulation was due entirely to differences in the duration of the closing gesture. [Work supported, in part, by NSF.]

Journal ArticleDOI
TL;DR: The authors investigated the use of F1 onset information, which constitutes a cue to the voicing contrast in English, but not in French, in French-English bilinguals using a /pen/•/ben/ minimal pair, meaningful in both languages.
Abstract: The use of F1 onset information, which constitutes a cue to the voicing contrast in English, but not in French, was investigated in French‐English bilinguals using a /pen/‐/ben/ minimal pair, meaningful in both languages. Two stimulus continua were constructed by digitally editing natural speech tokens. In both, VOT varied from −40 to 40 ms but the [en] portion of the stimuli was taken from a [ben] token in the Ben/VOT condition and a [phen] token in the Pen/VOT condition. Stimuli were presented in identification tests in two language modes, with either an English or French precursor word before each token. Bilinguals resident in France or Great Britain were classified in terms of their degree of bilingualism and language bias. Results show a significant effect of vowel onset characteristics on phoneme boundary for all subject groups. Irrespective of language mode, listeners exposed to English early (monolinguals, “strong” bilinguals, and English‐biased “mid” billinguals) were, on average, unable to consistently label stimuli in the Pen/VOT condition as voiced, even in the presence of prevoicing.

Journal ArticleDOI
TL;DR: This article found that motor-motor adaptation is not a product of perceptual adaptation, and it was not a result of subjects producing longer voice onset times after adaptation to a voiced consonant rather than shorter voice start times after adapting to a voiceless consonant.
Abstract: The two experiments described in this paper were designed to investigate further the phenomenon called motor-motor adaptation. In the first investigation, subjects were adapted while noise was presented through headphones, which prevented them from hearing themselves. In the second experiment, subjects repeated an isolated vowel, as well as a consonant-vowel syllable which contained a stop consonant. The findings indicated that motor-motor adaptation is not a product of perceptual adaptation, and it is not a result of subjects producing longer voice onset times after adaptation to a voiced consonant rather than shorter voice onset times after adaptation to a voiceless consonant.

Journal ArticleDOI
TL;DR: In this paper, a nine-token synthetic VOT continuum for fricative voicing was constructed with 10 ms steps in the voicing process and relative onset time (ROT) difference limens were measured for a range of durations of a 100Hz sawtooth waveform, which served as the analog for voicing in the speech continuum.
Abstract: A nine‐token synthetic VOT continuum for /feɪl/‐veɪl/ was constructed with 10‐ms steps in fricative voicing. Perceptual studies revealed better‐than‐chance discrimination between pairs of tokens labeled /feɪl/, chance discrimination between pairs of tokens labeled /veɪl/, and a slight peak in the discrimination function at the labeling boundary between /feɪl/ and /veɪl/. To better understand the noncategorical discrimination data, relative onset time (ROT) difference limens were measured for a range of durations of a 100‐Hz sawtooth waveform, which served as the analog for voicing in the speech continuum. ROT difference limens increased systematically with increasing duration of the standard sawtooth waveform. The ROT data suggest that better‐than‐chance discrimination near the /feɪl/‐endpoint and chance discrimination near the /veil/‐end‐point reflect larger absolute difference limens for onset of voicing as voicing duration was increased from /feɪl/ to /veɪl/. Traditional phonetic processes presumably account for the slight peak in the discrimination function at the labeling boundary between the /feɪl/ and /veɪl/categories. [Research supported by NIH.]

Patent
27 Apr 1990
TL;DR: In this article, a character string to be converted into a voice is inputted from a character-string input terminal and the voicing speed of the whole document when the character string is converted to a voice, and then one of several predetermined stages of voicing speeds is selected.
Abstract: PURPOSE:To improve the naturalness of a synthesized voice by inserting parameters which indicate the voicing time of each phrase and the strength of the coupling of phrases, and determining whether or not a pause is interposed between adjacent phrases. CONSTITUTION:A character string to be converted into a voice is inputted from a character string input terminal 11. Further, the voicing speed of the whole document when the character string is converted into a voice is inputted from a voicing speed input terminal 12. The input of the voicing speed is specified in speed unit such as a number of mora. Then one of several predetermined stages of voicing speeds is selected. The character string and voicing speed which are inputted are sent to a phrase voicing time calculation part 14, which calculates a voicing time corresponding to the voicing speed of each phrase constituting the document and sends it to a pause insertion position determination part 15. Further, the strength of the coupling of adjacent phrases constituting the document is inputted from a phrase coupling extent input terminal 13 and sent to the determination part. This determination part 15 determines the pause insertion position from the voicing time of each phrase and the strength of the coupling of the phrases.