scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1995"


PatentDOI
TL;DR: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination and the use of the system in the generation of a variety of voice effects.
Abstract: A modular system and method is provided for encoding and decoding of speech signals using voicing probability determination. The continuous input speech is divided into time segments of a predetermined length. For each segment the encoder of the system computes the signal pitch and a parameter which is related to the relative content of voiced and unvoiced portions in the spectrum of the signal, which is expressed as a ratio Pv, defined as a voicing probability. The voiced portion of the signal spectrum, as determined by the parameter Pv, is encoded using a set of harmonically related amplitudes corresponding to the estimated pitch. The unvoiced portion of the signal is processed in a separate processing branch which uses a modified linear predictive coding algorithm. Parameters representing both the voiced and the unvoiced portions of a speech segment are combined in data packets for transmission. In the decoder, speech is synthesized from the transmitted parameters representing voiced and unvoiced portions of the speech in a reverse order. Boundary conditions between voiced and unvoiced segments are established to ensure amplitude and phase continuity for improved output speech quality. Perceptually smooth transition between frames is ensured by using an overlap and add method of synthesis. Also disclosed is the use of the system in the generation of a variety of voice effects.

151 citations



Book ChapterDOI
01 Sep 1995
TL;DR: This article showed that prosody conditions segmental and suprasegmental features, such as lengthening at the end of a phrase (e.g. Oiler, 1973; Beckman & Edwards, 1990), and the “gestural magnitude of /h/ is weakened word medially or in deaccented words (Pierrehumbert & Talkin, 1992).
Abstract: Introduction It is well established that prosody conditions segmental and suprasegmental features. In English, for example, segments are lengthened at the end of a phrase (e.g. Oiler, 1973; Beckman & Edwards, 1990), and the “gestural magnitude” of /h/ is weakened word medially or in deaccented words (Pierrehumbert & Talkin, 1992). Additionally, Keating, Linker a Kim-Renaud, 1974; Kang, 1992) as Lenis Stop Voicing and Coda Neutralization, respectively.

65 citations


Journal ArticleDOI
TL;DR: The present study examines adult and child word-initial voice onset time productions in English and Hindi to determine the age of acquisition of the phonemic voice contrast and found that the larger the post-release voice onsetTime differences between pair members in the adult model, the earlier the contrast is reliably produced by child language learners.
Abstract: The present study examines adult and child word-initial voice onset time productions in English and Hindi (10 adults and 20 children in each language) to determine the age of acquisition of the phonemic voice contrast. Cross-linguistic differences in patterns of acquisition were found, but these need not be traced to the different phonological systems per se. An examination of the data indicates that the best predictor of age of voice contrast acquisition across languages is one which rests on the actual acoustic differences between members of phonologically contrastive pairs. In general it was found that the larger the post-release voice onset time differences between pair members in the adult model, the earlier the contrast is reliably produced by child language learners.

60 citations


Journal ArticleDOI
TL;DR: Examining the relationships between young cochlear-implant users' abilities to produce the speech features of nasality, voicing, duration, frication, and place of articulation and their abilities to utilize the features in three different perceptual conditions revealed that subjects who were most likely to hear the place-of- articulation, nasality- and voicing features in an audition-only condition were alsomost likely to speak these features correctly.
Abstract: The purpose of this investigation was to examine the relationships between young cochlear‐implant users’ abilities to produce the speech features of nasality, voicing, duration, frication, and place of articulation and their abilities to utilize the features in three different perceptual conditions: audition‐only, vision‐only, and audition‐plus‐vision. Subjects were 23 prelingually deafened children who had at least 2 years of experience with a Cochlear Corporation Nucleus cochlear implant, and an average of 34 months. They completed both the production and perception version of the Children’s Audio–visual Feature Test, which is comprised of ten consonant–vowel syllables. An information transmission analysis performed on the confusion matrices revealed that children produced the place of articulation fairly accurately and voicing, duration, and frication less accurately. Acoustic analysis indicated that voiced sounds were not distinguished from unvoiced sounds on the basis of voice onset time or syllabic duration. Subjects who were more likely to produce the place feature correctly were likely to have worn their cochlear implants for a greater length of time. Pearson correlations revealed that subjects who were most likely to hear the place of articulation, nasality, and voicing features in an audition‐only condition were also most likely to speak these features correctly. Comparisons of test results collected longitudinally also revealed improvements in production of the features, probably as a result of cochlear‐implant experience and/or maturation.

47 citations


Book
01 Jan 1995
TL;DR: In this article, Connell and Amalia Arvanti present an acoustic and electropalatographic study of lexical and postlexical palatalization in American English.
Abstract: 1. Introduction Bruce Connell and Amalia Arvanti Part I. Features and Perception: 2. Intermediate properties in the perception of distinctive feature values John Kingston and Randy L. Diehl 3. A double weak view of trading relations: comments on Kingston and Diehl Terrance M. Nearey 4. Speech perception and lexical representations: the role of vowel nasalization in Hindi and English John J. Ohala and Manjari Ohala 5. Processing versus representation: comments on Ohala and Ohala James M. McQueen 6. On the status of redundant features: the case of backing and rounding Kenneth De Jong 7. The perceptual basis of some sound patterns John J. Ohala Part II. Prosody: 8. Stress shift: do speakers do it or do listeners hear it? Esther Grabe and Paul Warren 9. The phonology and phonetics of the rhythm rule Irene Vogel, Timothy Bunnell, and Steven Hoskins 10. The importance of phonological transcription in empirical approaches to 'stress shift' versus 'early accent': comments on Grabe and Warren, and Vogel, Bunnell and Hoskins Stefanie Shattuck-Hufnagel 11. Perceptual evidence for the mora in Japanese Haruo Kubozono 12. On blending and the mora: comments on Kubozono Mary E. Beckman 13. Toward a theory of phonological and phonetic timing: evidence from Bantu Kathleen Hubbard 14. On phonetic evidence for the phonological mora: comments on Hubbard Bernard Tranel Part III. Articulatory Organization: 15. Prosodic patterns in the coordination of vowel and consonant gestures Caroline L. Smith 16. 'Where' is timing?: comments on Smith Richard Ogden 17. Asymmetrical prosodic effects on the laryngeal gesture in Korean Sun-Ah Jun 18. On a gestural account of lenis stop voicing in Korean: comments on Jun Gerard J. Docherty 19. A production and perceptual account of palatalization Daniel Recasens, Jordi Fontdevilla, and Maria Dolors Palleres 20. An acoustic and electropalatographic study of lexical and postlexical palatalization in American English Elizabeth C. Zsiga 21. What do we do when phonology is powerful enough to imitate phonetics: comments on Zsiga James M. Scobbie 22. The influence of syntactic structure on [s] to [ ] assimilation Tara Holst and Francis Nolan 23. Assimilation as gestural overlap: comments on Holst and Nolan Catherine P. Browman 24. Orals, gutturals and the jaw Sook-Hang Lee 25. The role of the jaw - active or passive?: comments on Lee Francis Nolan 26. The phonetics and phonology of glottalized consonants in Lendu Didier Demolin 27. Lendu consonants and the role of overlapping gestures in sound change: comments on Demolin Louis Goldstein Indexes.

33 citations


Journal ArticleDOI
TL;DR: The results show that the subjects are more advanced in the acquisition of the appropriate VOT values for the voiceless than for the voiced consonants, which may be related to the increased neuromuscular control and more complex muscle activity necessary for maintaining voicing during the closure, especially for velar stops.

26 citations



Journal ArticleDOI
TL;DR: In this paper, a computer is used to display speech waveforms in the time and frequency domains, and then the speech can be played back in its natural form, and also after it has been analyzed and synthesized.
Abstract: Speech produced by individual audience participants will be stored on a computer. Speech waveforms in the time and frequency domains will then be displayed by the computer. The speech will be played back in its natural form, and also after it has been analyzed and synthesized. Various transforms of the speech will be implemented and played back. For example, the speech will be speeded up and slowed down. The gender of the voice will be changed by altering the pitch periods. The speech will be made to seem as if the talker had a cold by changing the voicing of the consonant sounds.

18 citations



Journal ArticleDOI
TL;DR: It is suggested that the direction of short-duration fundamental frequency perturbations following consonants helps to signal consonant [+voice]/[––voice] (abbreviated as [voice]) status, and it is proposed that the [voice] cue corresponds to the direction and extent of F₀ perturbation relative to the overall intonation contour.
Abstract: Previous research has suggested that the direction of short-duration fundamental frequency (F0) perturbations following consonants helps to signal consonant [+voice]/[-voice] (abbreviated as [voice]) status. It has been proposed that the [voice] cue corresponds to the direction and extent of F0 perturbations relative to the overall intonation contour. A competing view, the low-frequency hypothesis, suggests that F0 participates in a more general way whereby low-frequency energy near the consonant contributes to [+voice] judgments. Listeners identified multiple stimulus series, each varying in voice onset time and ranging from /aga/ to /aka/. The series differed in overall intonation contour as well as in the direction of F0 perturbation relative to that contour. Consistent with one version of the low-frequency hypothesis, the F0 value at voicing onset, rather than the relative direction of the F0 perturbation, was the best predictor of [voice] judgments.


Journal Article
TL;DR: The results of these studies indicate that in the cases investigated, the coding of voice source information by rate of stimulation does not significantly augment the cues present in the spatially distributed constant rate stimulation pattern.
Abstract: Two studies are reported in which the effectiveness of explicitly coding voicing and fundamental frequency information for the Nucleus cochlear implant was investigated. In the first study, the voicing perception of a group of three experienced Multipeak users was evaluated when they were using Multipeak and a modified Multipeak in which the explicit fundamental frequency and voicing cues were eliminated and replaced with a 250-Hz constant rate of stimulation. The results of consonant and monosyllabic word tests showed that there was no significant difference in the subjects' ability to discriminate voicing. In the second study, the ability of a group of five experienced users of the constant rate spectral maxima sound processor (SMSP) strategy to discriminate suprasegmental contrasts was evaluated when they were using the SMSP strategy and a modified SMSP strategy that included a rate-encoded representation of the fundamental frequency on the most apical stimulation channel. The results of intonation, roving stress, and question-statement tests showed that there was no significant difference between the scores recorded with these strategies. Since the temporal voicing cue is not a primary cue to voicing discrimination for Multipeak users, and the provision of an additional rate cue to the SMSP strategy does not improve SMSP users' ability to discriminate suprasegmental contrasts, the results of these studies indicate that in the cases investigated, the coding of voice source information by rate of stimulation does not significantly augment the cues present in the spatially distributed constant rate stimulation pattern.



Journal ArticleDOI
TL;DR: In this paper, an intermediate level of phonetic features and their individual phonetic correlates are integrated into perceptually coherent units, referred to as integrated perceptual properties (i.e., REPs).
Abstract: Apart from phonological features and their individual phonetic correlates, an intermediate level of structure apparently exists in which subsets of phonetic properties form perceptually coherent units, referred to here as ‘‘integrated perceptual properties.’’ The mapping between each successive level of structure is arguably many‐to‐one, elevating both redundancy and distinctiveness at the level of phonological features. For the distinctive feature [voice], a main integrated perceptual property corresponding to the [+voice] value is the presence of low‐frequency energy during or near the consonant, which may be further analyzed into at least three phonetically distinct subproperties: voicing during the consonant constriction, a low F1 near the constriction, and a low F0 in the same region. Two predictions follow if these three subproperties contribute to a single integrated perceptual property. One is that the effects on [voice] judgments of varying either a low F1 or F0 should pattern in similar ways for...

Book
01 Jan 1995
TL;DR: Perceiving vowels in the presence of another sound - a quantitative test of the "old-plus-new" heuristic, C.A.R. Darwin on the contribution of instance-specific characteristics to speech perception, and a non-reductive view of speech processing.
Abstract: Perceiving vowels in the presence of another sound - a quantitative test of the "old-plus-new" heuristic, C.J. Darwin on the contribution of instance-specific characteristics to speech perception, A.R. Bradlow et al perceiving visual and auditory information in consonant-vowel and vowel syllables, M.M. Cohen and D.W. Massaro phonetic and lexical effects in speech perception, W. Serniclaes et al relationships between different descriptive frameworks for plosive features of voicing and aspiration, C. Scully and S. Mair vers une unification des espaces vocaliques, L.J. Boe et al vowel transitions, vowel systems, and the distinctive region model, R. Carre and M. Mrayati characterising formant trajectories by tracking vocal tract resonances, G. Bailly variational formulation of the acoustico-articulatory link and the inverse mapping by means of a neural network, P. Jospa et al the supervision of speech production, M.A.A. Tatham de l'impertinence, ou comment relier complexite linguistique et qualite acoustique, C. Benoit and C. Abry from speech variability to pattern processing - a non-reductive view of speech processing, J.-S. Lienard multitiered phonetic approach to speech labelling, A. Marchal et al a principle-based model for predicting the prosody of speech, M. Rossi interaction prosodie-syntaxe en francais, cas des adverbes en -ment, P. Martin the psychological reality of phonological constructs and writing systems, D. Holender on the relations between phonological and metaphonological processing - a study using the ABX discrimination task, J. Morais et al phonological component in automatic speech recognition the case of liaison processing, G. Perennou towards a multilevel model for hypothetical reasoning in continuous speech recognition, A. Bonneau et al improving the quality of speech synthesis at segmental level, W.J. Hess dynamic models of the glottal pulse, J.B. Schoentgen on the use of SVD and high-order statistics for the endpoint detection of speech, M. Rangoussi et al.



Journal ArticleDOI
TL;DR: The authors examined articulatory kinematics in /aCV/ sequences, where the consonant is one of the set /p,b,t,d,k,g/ and the second vowel one of /i,a,u/.
Abstract: Lip closing movements for bilabial stops have been reported to be faster and of shorter duration for voiceless than for voiced stops. The experimental evidence is conflicting, however, and recordings have mostly been limited to lip and jaw movements in a single dimension. The present study examines articulatory kinematics in /aCV/ sequences, where the consonant is one of the set /p,b,t,d,k,g/ and the second vowel one of /i,a,u/. A magnetometer system was used to track vertical and horizontal movements of receivers placed on the lips and the jaw, and on four points on the tongue. Tangential velocity was used to define movement onsets and offsets. Movement amplitude was calculated as the path of the receiver from movement onset to offset. Preliminary results from two subjects suggest the possibility that the effects of consonant voicing on movement kinematics vary for different articulators. Tongue body movements towards consonantal closure had consistently higher velocity, larger amplitude and longer duration for voiced than for voiceless velar stops. Tongue tip and lip and jaw closing movements showed less robust differences between voiced and voiceless alveolar and labial stops. [Work supported by NIH.]



01 Jan 1995
TL;DR: The purpose of this paper is to account for the distribution of voice in Japanese by establishing a constraint ranking that covers Japanese vocabulary of any origin by giving an analysis using a unified ranking rather than different rankings depending on origins of the vocabulary.
Abstract: It is well known that there are four classes of Japanese vocabulary with respect to its origin; Yamato vocabulary consists of native morphemes, SinoJapanese consists of borrowed morphemes from Chinese, Foreign is a loanword from a language other than Chinese, and Mimetic describes sounds or manners. Each of these classes has different phonological properties.1 There are three phenomena with respect to the distribution of voice in Japanese. One of them is that post -nasal obstruents in Yamato vocabulary and Mimetic are mostly voiced while those in SinoJapanese and Foreign are not. I will mainly focus on this property in this paper. However, I will also discuss the other phenomena, namely the compound voicing alternation (Rendaku) and the restriction of voiced sounds in a morpheme (Lyman's Law). These phenomena typically occur with Yamato vocabulary only. Although the domain of each phenomenon largely overlaps with a certain class of lexical origin, they do not match completely with each other. The purpose of this paper is to account for the distribution of voice in Japanese by establishing a constraint ranking that covers Japanese vocabulary of any origin. The organization of the paper is as follows. In section 2, I will present data and four problems to be solved. General tendency of Yamato vocabulary are summarized in 2.1, and many exceptions to the generalization are presented in 2.2. In section 3, I will give an analysis using a unified ranking rather than different rankings depending on origins of the vocabulary. In section 4, I will present two pieces of evidence -historical and acquisional--to support my claim that Japanese has only one ranking.


01 Jan 1995
TL;DR: In this article, the authors present a cross-linguistic analysis of the VOlClOg effects found across languages to two simple rules: the first is the mechanism which accounts for final devoicing in a language like German, and the second is the rule that accounts for the voicing agreement commonly found in consonant clusters, known as voicing assimilation.
Abstract: Recent cross-linguistic research on VOIClOg has reduced many of the VOlClOg effects found across languages to two simple rules. The first is the mechanism which accounts for final devoicing in a language like German. The second is the rule which accounts for the voicing agreement commonly found in consonant clusters, known as voicing assimilation. Starting with final devoicing, the basic idea is that for some languages, the feature [voice] delinks from the rime position of a syllable. As we see in (1), we can think of this in two slightly different ways: