scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1997"


Book
10 Feb 1997
TL;DR: This chapter discusses the development of the Acoustic Theory of Speech Production: Deriving Schwa, a Cross-Linguistic Map of Chinese Tones, and its application to Speech Perception.
Abstract: Acknowledgments. Introduction. 1. Basic Acoustics and Acoustic Filters:. 1.1. The Sensation of Sound. 1.2. The Propagation of Sound. 1.3. Types of Sounds. 1.3.1. Simple Periodic Waves. 1.3.2. Complex Periodic Waves. 1.3.3. Aperiodic Waves. 1.4. Acoustic Filters. Exercises. 2. Digital Signal Processing:. 2.1. Continuous versus Discrete Signals. 2.2. Analog-to-Digital Conversion. 2.2.1. Sampling. 2.2.2. Quantization. 2.3. Signal Analysis Methods. 2.3.1. Auto-Correlation Pitch Tracking. 2.3.2. RMS Amplitude. 2.3.3. Fast Fourier Transform (FFT). 2.3.4. Digital Filters. 2.3.5. Linear PredictiveCoding (LPC). 2.3.6. Spectra and Spectrograms. Exercises. 3. Basic Audition:. 3.1. Anatomy of the Peripheral Auditory System. 3.2. The Auditory Sensation of Loudness. 3.3. Frequency Response of the Auditory System. 3.4. Auditory Representations. Exercises. 4. Speech Perception:. 4.1. A Speech Perception Experiment. 4.2. Maps from Distances. 4.3. The Perceptual Map of Fricatives. 4,4. The Perceptual Map of [Place]. 4.5. The Limits of Perceptual Universality: A Cross-Linguistic Map of Chinese Tones. Exercises. 5. The Acoustic Theory of Speech Production: Deriving Schwa:. 5.1. Voicing. 5.2. Voicing Quanta. 5.3. Vocal Tract Filtering. 5.4. Pendulums, Standing Waves, and Vowel Formants. 5.5. LPC Spectral Analysis. Exercises. 6. Vowels:. 6.1. Tube Models of Vowel Production. 6.2. Perturbation Theory. 6.3. "Preferred" Vowels: Quantal Theory and Adaptive Dispersion. 6.4. Vowel Formants and the Acoustic Vowel Space. 6.5. Auditory and Acoustic Representations of Vowels. 6.6. Cross-Linguistic Vowel Perception. Exercises. 7. Fricatives:. 7.1. Turbulence. 7.2. Place of Articulation in Fricatives. 7.3. Quantal Theory and Fricatives. 7.4. Fricative Auditory Spectra. 7.5. Dimension of Fricative Perception. Exercises. 8. Stops and Affricates:. 8.1. Source Functions for Stops and Affricates. 8.1.1. Phonation Types. 8.1.2. Sound Sources in Stops and Affricates. 8.2. Vocal Tract Filter Functions in Stops. 8.3. Affricates. 8.4. Auditory Properties of Stops. 8.5. Stop Perception in Different Vowel Contexts. Exercises. 9. Nasals and Laterals:. 9.1. Bandwidth. 9.2. Nasal Stops. 9.3. Laterals. 9.4. Nasalization. 9.5. Nasal Consonant Perception. Exercises. References. Answers to Selected Short-Answer Questions. Index.

283 citations


Journal ArticleDOI
TL;DR: This paper investigated the effect of speaking rate on stop consonant production in three languages which have different phonetic categories of voicing and found that the short lag category did not change as a function of the speaking rate in any of the three languages examined.

198 citations


Journal ArticleDOI
TL;DR: Students of linguistic sound systems would do well to study in detail the phonetic base, as this chapter considers an example case where a voiceless aspirated stop consists of an oral occlusion, cued by silence, as well as an articulatorily independent laryngeal abduction, cuing by broadband noise.
Abstract: Linguistic sound systems necessarily possess contrastive values that are sufficiently distinct from one another that their individual characters may be learned by the listener. In this way, any given value in any given system fulfils its functional role of rendering forms distinct which differ in meaning. Articulatory, aerodynamic, acoustic and auditory constraints serve to mediate between such sound–meaning correspondences in non-trivial ways. Indeed, if it can be shown that the sound patterns of language are in part explainable by these physical systems, then students of linguistic sound systems would do well to study in detail the phonetic base. Consider an example case. Laryngeal gestures and supralaryngeal gestures are by and large articulatorily independent of each other. Thus, for example, a voiceless aspirated stop consists of an oral occlusion, cued by silence, as well as an articulatorily independent laryngeal abduction, cued by broadband noise. Were the phonetic realisation of these two gestures strictly simultaneous, the cues signalling the laryngeal abduction would not be perceived as such by the listener (*[ot]). A listener can tell that there is no voicing, but cannot recover more specific information regarding the state of the glottis during oral closure. Stated simply, the full closure here reduces the acoustic output to zero. With zero acoustic energy, no source information other than silence is transmitted to the listener. However, upon staggering the two gestures, the otherwise obscured information is rendered salient.

78 citations


Journal ArticleDOI
TL;DR: The findings from these experiments indicate that phoneme and rate information are encoded in an integral manner during speech perception, while talker characteristics are encoded separately.
Abstract: The acoustic structure of the speech signal is extremely variable due to a variety of contextual factors, including talker characteristics and speaking rate. To account for the listener’s ability to adjust to this variability, speech researchers have posited the existence of talker and rate normalization processes. The current study examined how the perceptual system encoded information about talker and speaking rate during phonetic perception. Experiments 1–3 examined this question, using a speeded classification paradigm developed by Garner (1974). The results of these experiments indicated that decisions about phonemic identity were affected by both talker and rate information: irrelevant variation in either dimension interfered with phonemic classification. While rate classification was also affected by phoneme variation, talker classification was not. Experiment 4 examined the impact of talker and rate variation on the voicing boundary under different blocking conditions. The results indicated that talker characteristics influenced the voicing boundary when talker variation occurred within a block of trials only under certain conditions. Rate variation, however, influenced the voicing boundary regardless of whether or not there was rate variation within a block of trials. The findings from these experiments indicate that phoneme and rate information are encoded in an integral manner during speech perception, while talker characteristics are encoded separately.

69 citations


Journal ArticleDOI
TL;DR: Results showed that the patterns of voicing in the fricative noise interval were influenced by the voicing characteristics of preceding stop consonants, but phonetic context did not affect the criterial attribute associated with the phonetic category of voicing.
Abstract: This study investigated the acoustic characteristics of voicing in the production of fricative consonants. The fricatives [f v s z] were used in combination with the vowels [i e a o u] to create CV syllables, which were produced by four subjects both in a context condition (following voiced and voiceless velar stops) and in isolation. Analyses were conducted of the time course of glottal excitation during the fricative noise interval in the voiced and voiceless fricative stimuli. Results showed that the patterns of voicing in the fricative noise interval were influenced by the voicing characteristics of preceding stop consonants. Nonetheless, these carryover coarticulatory effects were short-lived, influencing only the first 10's of ms of the following segment. Despite the influence of phonetic context on the patterns of voicing, an acoustic measure relating to the presence or absence of glottal excitation at the acoustic boundaries of the fricative noise reliably classified a majority (93%) of the fricative consonants in terms of the phonetic category of voicing. Thus, while phonetic context affected the patterns of glottal excitation in the fricative noise interval, it did not affect the criterial attribute associated with the phonetic category of voicing.

48 citations



Journal ArticleDOI
TL;DR: The authors examined influences of phonological representations upon the temporal relations in the production of word-medial VC-sequences, including vowel, closure, and closure, in the context of VC-sequence generation.
Abstract: This paper examines influences of phonological representations upon the temporal relations in the production of word-medial VC-sequences. The parameters under investigation are vowel, closure, and ...

45 citations


Journal ArticleDOI
TL;DR: Evidence from brain-damaged populations indicating that the perception of certain voicing cues is less dependent upon left hemisphere mechanisms than the ability to perceive place of articulation contrasts is reviewed, suggesting that the right hemisphere may play a special role in the categorical processing of voicing.

42 citations


Patent
20 Oct 1997
TL;DR: In this paper, a dictionary part 107 where object words of voice recognition are gathered, a voice analysis part 103 which performs a voice analyzing process, a sound model part 105 which has patterns of voice in phoneme units, a voicing deformation feeling model part 106 which represents the deformation of a vocal sound spectrum by a feeling, and a voice recognition part 104 which performed a voice recognizing process by coupling the sound model, voice analysis, and voice feeling model.
Abstract: PROBLEM TO BE SOLVED: To recognize the level of a speaker's feeling by a voice recognition system. SOLUTION: The system and method are equipped with a dictionary part 107 where object words of voice recognition are gathered, a voice analysis part 103 which performs a voice analyzing process, a sound model part 105 which has patterns of voice in phoneme units, a voicing deformation feeling model part 106 which represents the deformation of a vocal sound spectrum by a feeling, and a voice recognition part 104 which performs a voice recognizing process by coupling the sound model part 105, voicing deformation feeling model part 106, and dictionary part 107, and outputs an object word of voice recognition as a voice recognition result and also outputs a feeling level representing the degree of the speaker's feeling that the voice has. Other voice analysis parts output feeling levels from the features of the power of the voice. COPYRIGHT: (C)1999,JPO

41 citations


Journal ArticleDOI
TL;DR: It is speculated that phonological processes may be affected both by speakers' decision processes to adjust their articulation for the benefit of the listener and by speakers’ internal structure and interactive activation among linguistic units.
Abstract: Previous research has demonstrated that semantics and pragmatics influence durational modifications in words and segments. The present research investigated specifically how semantics and pragmatics influence preservation of a phonemic contrast. Experiment 1 examined alveolar flapping in American English. Potentially flapped words, for example, writer and rider, were embedded in each of two types of semantic passages: semantically biasing and semantically neutral passages. In addition, these passages were produced in one of two pragmatic conditions: listener-present and listener-absent. The results showed that the phonemic voicing distinction between /t/ and /d/ was preserved in biasing passages and in the listener-present condition. The /t/-/d/ distinction was not preserved in neutral passages or in the listener-absent condition. Experiment 2 examined whether listeners could use the durational differences found to distinguish phonemic voicing in Experiment 1. Our investigation demonstrates that semantics and pragmatics interact with phonological processes in speech production. We speculate that phonological processes may be affected both by speakers' decision processes to adjust their articulation for the benefit of the listener and by speakers' internal structure and interactive activation among linguistic units.

35 citations


Journal ArticleDOI
TL;DR: The results of these converging tests lead to the conclusion that speech perception involves a process in which acoustic information for coarticulated gestures is parsed from the stream of speech.
Abstract: Coarticulatory acoustic variation is presumed to be caused by temporally overlapping linguistically significant gestures of the vocal tract. The complex acoustic consequences of such gestures can be hypothesized to specify them without recourse to context-sensitive representations of phonetic segments. When the consequences of separate gestures converge on a common acoustic dimension (e.g., fundamental frequency), perceptual parsing of the acoustic consequences of overlapping spoken gestures, rather than associations of acoustic features, is required to resolve the distinct gestural events. Direct tests of this theory were conducted. These tests revealed mutual influences of (1) fundamental frequency during a vowel on prior consonant perception, and (2) consonant identity on following vowel stress and pitch perception. The results of these converging tests lead to the conclusion that speech perception involves a process in which acoustic information for coarticulated gestures is parsed from the stream of speech.

Journal ArticleDOI
TL;DR: Evidence is provided for the generality of this effect by showing analogous results for a /b/––/w/ contrast, specified by transition duration, and the implications for models of rate-dependent processing are discussed.
Abstract: Many studies have shown that listeners process speech in a rate-dependent manner, altering the location of phonetic category boundaries in accord with the acoustic consequences of a change in rate during speech production. In a recent series of papers that focused on a voicing contrast, we reported that the perceptual adjustment for rate is not limited to the region of the category boundary, but extends to well within the category, producing a change in which stimuli are perceived to be the best category exemplars. In the current paper, we provide evidence for the generality of this effect by showing analogous results for a /b/-/w/ contrast, specified by transition duration. The implications of these findings for models of rate-dependent processing are discussed.

Proceedings ArticleDOI
21 Apr 1997
TL;DR: In an investigation on emotionally loaded speech material it could be shown, that the named acoustic parameters are useful for differentiating between the emotions happiness, sadness, anger, fear and boredom.
Abstract: It is well known, that personal voice qualities differ in the speakers use of temporal structures, F0 contours, articulation precision, vocal effort and type of phonation. Whereas temporal structures and F0 contours can be measured directly in the acoustic signal and conclusions about articulation precision can be made from the formant structure, this paper focuses especially on the vocal effort and the type of phonation. These voice quality percepts are a combination of several acoustic voice quality parameters: the glottal pulse shape in the time domain or damping of the harmonics in the frequency domain, spectral distribution of turbulent signal components and voicing irregularities. In an investigation on emotionally loaded speech material it could be shown, that the named acoustic parameters are useful for differentiating between the emotions happiness, sadness, anger, fear and boredom. The perceptual importance of the above acoustic parameters is investigated in perception experiments with synthetic and resynthesized speech.

Journal ArticleDOI
01 Aug 1997-Lingua
TL;DR: Experimental evidence supports the claim that low-level coarticulatory effects and auditory misparsing account for s-aspiration, sibilant (de)voicing, and saffrication.

Proceedings ArticleDOI
21 Apr 1997
TL;DR: The hidden Markov model (HMM) based minimum mean square error (MMSE) estimator is extended to incorporate a ternary voicing state, and applies it to a harmonic representation of voiced speech, and noise reduction during voiced sounds is improved.
Abstract: This paper describes a technique for reduction of non-stationary noise in electronic voice communication systems. Removal of noise is needed in many such systems, particularly those deployed in harsh mobile or otherwise dynamic acoustic environments. The proposed method employs state-based statistical models of both speech and noise, and is thus capable of tracking variations in noise during sustained speech. This work extends the hidden Markov model (HMM) based minimum mean square error (MMSE) estimator to incorporate a ternary voicing state, and applies it to a harmonic representation of voiced speech. Noise reduction during voiced sounds is thereby improved. Performance is evaluated using speech and noise from standard databases. The extended algorithm is demonstrated to improve speech quality as measured by informal preference tests and objective measures, to preserve speech intelligibility as measured by informal diagnostic rhyme tests, and to improve the performance of a low bit-rate speech coder and a speech recognition system when used as a pre-processor.

01 Jan 1997
TL;DR: A review of the current state of the art about the organization of the devoicing gesture in speech can be found in this article, where the influence of place and manner of articulation on the devoing gesture in single voiceless consonants is discussed.
Abstract: This paper reviews the current state of our knowledge about the organization of the devoicing gesture in speech. First of all, the influence of place and manner of articulation on the devoicing gesture in single voiceless consonants is discussed. This provides the background for consideration of coarticulatory effects in two main classes of consonantal sequences. The first class involves sequences of voiceless stop (or fricative) plus sonorant (e.g. /pl/). It is well-known that the sonorant can undergo devoicing, induced by the coarticulatory effect of the adjacent voiceless consonant. It is much less clear if and how laryngeal-oral interarticulatory coordination in modified with respect to the pattern found for single voiceless consonants. This could be of great theoretical interest since there are reports that total duration of voicelessness in e.g. /pl/ is longer than in /p/. It would be intriguing if the devoicing gesture, accordingly, were longer in the former case, as it is not clear what current theory of coarticulation could handle this. The second class involves sequences of purely voiceless sounds. Here the view is of coarticulation as coproduction, i.e. what sounds in consonant sequences are associated with a separate laryngeal gesture, and how multiple gesture blend. While a considerable amount is known about the laryngeal movements per se, it is argued (as for the first class of sequences) that the laryngeal findings need to be linked more closely to improved knowledge of the organization of the relevant oral gesture

01 Dec 1997
TL;DR: In this paper, the phonetic characteristics of the Tfuea dialect of Tsou, an Austronesian language spoken in Southern Taiwan, are examined in phonetic detail, several points of disagreement in previous descriptions of the language are cleared up.
Abstract: This paper examines the phonetic characteristics of the Tfuea dialect of Tsou, an Austronesian language spoken in Southern Taiwan. The authors employ both acoustic and auditory analyses, as such it represents the first instrumental study of Tsou. As the consonant and vowel inventories of the language are examined in phonetic detail, several points of disagreement in previous descriptions of the language are cleared up. The analyses include vowel formant measures, consonant voicing and VOT by place of articulation, and intrinsic pitch of vowels. In addition to the segmental description, there is a preliminary investigation of the consonant clusters many of which are only rarely attested in the world's languages.

Journal ArticleDOI
TL;DR: This paper investigated the effect of the linguistic experience on the duration of the preceding closure interval (CI) on word-initial-stop consonants and found that Spanish speakers use CI to help distinguish the voicing characteristics of stops in wordinitial position and that bilinguals reweigh different voicing cues as a function of the language mode.
Abstract: This study investigated the effect of the linguistic experience on the duration of the preceding closure interval (CI) on word–initial‐stop consonants. Native speakers of English (NE), Spanish (NS), and Spanish–English bilinguals produced sentences containing words beginning with either a voiced or a voiceless stop consonant. As is typical for word–initial stops in English, no difference in the CI occurred between voiced and voiceless consonants for the NE speakers. The NS speakers produced the voiceless stops with CIs similar to the NE voiced stops (consistent with the fact that both are classified phonetically as short‐lag stops). The voiced stops, however, had significantly shorter CIs. Like the NE speakers, the bilinguals in the English mode produced voiced and voiceless stops with equal CIs. In the Spanish mode, the bilinguals maintained a distinction in CI between voiced and voiceless stops, although the CIs were significantly different from their NS counterparts. The results suggest that Spanish speakers use CI to help distinguish the voicing characteristics of stops in word–initial position and that bilinguals reweigh different voicing cues as a function of the language mode.

BookDOI
31 Jan 1997
TL;DR: In this article, the LF-model revisited, Gunnar Fant consequences of intonation for the voice source, Janet Pierrehumbert fundamental frequency rule for English discourse, Noriko Umeda physiological and acoustical correlates of voicing distinction in oesophageal speech, Hajime Hirose.
Abstract: Part 1 Background: speech - a physicist remembers, Manfred R. Schroeder Part 2 Laryngeal functions in speech: male-female differences in anterior commissure angle, Minoru Hirano et al correlations among intrinsic laryngeal muscles during speech gestures, Christy L. Ludlow et al regulation of fundamental frequency with a physiologically-based model of the larynx, Ingo R. Titze high-speed digital image analysis of temporal changes in vocal fold vibration in tremor, Shigeru Kiritani and Seiji Niimi phonetic control of the glottal opening, Masayuki Sawashima. Part 3 Voice source characteristics in speech: frequency domain analysis of glottal flow - the LF-model revisited, Gunnar Fant consequences of intonation for the voice source, Janet Pierrehumbert fundamental frequency rule for English discourse, Noriko Umeda physiological and acoustical correlates of voicing distinction in oesophageal speech, Hajime Hirose. Part 4 Articulatory organization: the postalveolar fricatives of Polish, Morris Halle and Kenneth N. Stevens a note on the durations of American English consonants, Thomas H. Crystal and Arthur S. House articulatory coordination and its neurobiological aspects, Shinji Maeda and Kiyoshi Honda token-to-token variation of tongue-body vowel targets - the effect of context, Joseph S. Perkell and Marc H. Cohen the phonetic realization of the haiku form in Estonian poetry, compared to Japanese, Ilse Lehiste synthesis and coding of speech using physiological models, M. Mohan Sondhi. Part 5 Verbal behaviour - sound structure, information structure: comparison of speech sounds - distance vs. cost metrics, John J. Ohala a note on Japanese passives, James D. McCawley sentence production and information, Hiroya Fujisaki.

Journal ArticleDOI
TL;DR: This study investigated the potential influence of alterations in the temporal structure of speech produced during simultaneous communication on the perception of final consonant voicing and indicated that accurate perception offinal consonants voicing was not impaired by changes in theporal structure ofspeech that accompany simultaneous communication.

Journal ArticleDOI
TL;DR: In this paper, recordings were made of three proficient electronic larynx users pronouncing the words tea, D, toe, doe, Kate, gate, cot, got, embedded in a frame sentence, 13 times each.
Abstract: Voiceless consonants are problematic for the electronic larynx user and the ability, or lack of it, to signal the presence of a voiceless consonant in a word may have a crucial effect on the intelligibility of the word to a normal listener. In order to determine the acoustic and perceptual properties of voiceless consonants in electronic larynx speech, recordings were made of three proficient electronic larynx users pronouncing the words tea, D, toe, doe, Kate, gate, cot, got, embedded in a frame sentence, 13 times each. Perception tests were carried out, using panels of listeners, and the results demonstrated successful identification of voiceless consonant targets at rates well above chance for the productions by all three speakers. Acoustic analysis suggested slightly different strategies on the part of the different subjects. All introduced an interval of friction noise after the closure release, and one subject employed rapid switching on and off of the device in order to create an analogue of voice-...

01 Jan 1997
TL;DR: In this paper, the authors propose that the two primes conventionally used to denote nasality and voicing are identical, and that the difference is determined by the notion of headship: the headed prime contributes voicing and its headless counterpart manifests itself as nasality.
Abstract: This paper addresses the apparently paradoxical behaviour of nasals in Yamato Japanese, where voice is active for nasals in postnasal voicing but inactive in Rendaku (Ito et al 1995). In order to explain this paradox, I propose that the two primes conventionally used to denote nasality and voicing are identical, and that the difference is determined by the notion of headship: the headed prime contributes voicing, and its headless counterpart manifests itself as nasality. These proposals are presented within the context of Element Theory (Kaye et al. 1985; Harris & Lindsey 1995), where output representations are redundancy-free, refer only to privative primes, and are fully interpretable.


Proceedings ArticleDOI
12 Oct 1997
TL;DR: Spectral analysis of the region of first formant onset in the raw stimuli show that processing by the first stage of the model, mimicking the functions of the peripheral auditory system, is not essential to the observed behavior.
Abstract: Important aspects of the voiced/unvoiced categorization of synthetic syllable-initial stop consonants are reproduced by a two stage biocybernetic simulation of the auditory system. This behavior is emergent - it is not explicitly programmed into the model - and no fine timing information is necessary. Unlike real (human and animal) listeners, the computational auditory model can be systematically manipulated and probed to determine the basis of its behavior. This reveals the importance of the region of first formant onset to the perception of voicing for these stimuli. Spectral analysis of this region in the raw stimuli show that processing by the first stage of the model, mimicking the functions of the peripheral auditory system, is not essential to the observed behavior. Thus, in this case at least, the phonetic perception of voicing is directly recoverable from both acoustic and auditory representations of the stimuli.

Journal ArticleDOI
TL;DR: In this paper, the authors compare the voiceless and voiceless syllable-initial stop consonants in siSwati and show that the distinction between voiceless plosives cannot be traced by means of the usual absence/presence of the voice bar in the spectrograms.
Abstract: This paper compares the voicing status of /ptk/ and /bdg/ in siSwati. Expanded waveform shapes and spectrographic representations of our experimental data showed us immediately that the distinction between the voiceless and voiced syllable-initial plosives could not be traced by means of the usual absence/presence of the voice bar in the spectrograms. This resulted in adopting alternative cues, such as voice onset time (VOT) and closure duration. The use of these acoustic cues allowed us to oppose unambiguously the voiced and voiceless syllable-initial stop consonants in siSwati. At the same time, this experiment permitted us to show the extent to which the voice onset of /bdg/ is delayed in siSwati. Further, this analysis resulted in confirming that voiceless and voiced plosives cannot always be opposed by the status of the glottis during their articulatory closure.

01 Jan 1997
TL;DR: In this article, Baroni et al. refer to the fact that the alveolar fricative occurring in intervocalic position is always [z] with the descriptive label of 'intervocal' voicing' and adopt the symbol /S/ to refer to 'archiphoneme' not specified for [±voice'.
Abstract: I will refer to the fact that the alveolar fricative occurring in intervocalic position is always [z] with the descriptive label of “intervocalic voicing”, and I will adopt the symbol /S/ to refer to the alveolar fricative “archiphoneme”, not specified for [±voice]. As I showed in Baroni 1997, intervocalic /S/ voicing is an extremely productive phenomenon of contemporary Northern Italian (for example, it applies in the production of nonsense words and recent loanwords). There is, however, a systematic class of exceptions to it, exemplified by the forms in (2):

Patent
19 Sep 1997
TL;DR: In this paper, a speech waveform and a character string are inputted to a preprocessing part 102 through an input part 101 and segmented into syllables, syllable by syllable.
Abstract: PROBLEM TO BE SOLVED: To provide a speech segmentation device which can properly segment even a document that has a small estimation error of segmentation and has a voiceless section. SOLUTION: A speech waveform and a character string are inputted to a preprocessing part 102 through an input part 101 and segmented into syllables. A voiceless change flag processing part 103 judges whether or not a voiceless change is possibly made by the segmented syllables from following information on a syllable kind, etc., and holds the judgement result as a voiceless change flag, syllable by syllable. A time length generation part 104 calculates a voicing tempo from the continuance of the input speech calculated from the start point and end point of a voiced sound section to find continuation, syllable by syllable. A pattern matching part 106 uses the continuance by the syllables to make the character string information correspond to unit sections sectioned by voiceless sounds and voiceless sounds. A syllable border determination part 107 specifies a syllable border on the basis of the respective pieces of information on the syllable continuance.

01 Jan 1997
TL;DR: This paper argued that the instability of the articulatory and acoustic cues does not lead to indeterminacy as to the appropriate representation of laryngeal oppositions because the true nature of opposition is reinforced by the phonology.
Abstract: The proper characterization of voicing distinctions has long been controversial, a problem that arises in large part because of the unstable phonetic cues that are assiciated with laryngeal oppositions. The articulatory and acoustic cues that serve to signal viced/voiceless opposition appear to vary from language to language. These cues include vocal cord vibration, duration, tenseness, glottal spreading or constriction and articulatory force (see Kohler 1979, 1984; Lisker 1986; Keating 1984, 1990; Doherty 1993). It is my contention that the instability of the cues does not lead to indeterminacy as to the appropriate representation of laryngeal oppositions because the true nature of opposition is reinforced by the phonology. In Avery (1996), it is argued that what has traditionally been referred to as voicing opposition may in fact be represented in three different ways with respect to the specification of the segments involved in the opposition. Voiced obstruents may be marked by the presence of the feature [voice], the presence of the SV node (see Rice & Avery 1989, 1990, 1993; Rice 1992, 1993) or the absence of any laryngeal specification. I refer to the first option as Laryngeal Voice (LV), the second as Sonorant Voice (SV) and the third as Contextual Voice (CV). I further assume that a bare Laryngeal node may be phonetically enhanced through the addition of the feature Spread glottis (SG), yielding voiceless aspirates. This is typically the case in language such as English, German and Turkish, which are analyzed as CV languages.

Journal ArticleDOI
TL;DR: The task of hawking which the authors attempted was very useful in speech therapy on PD patients who exhibited the freezing phenomenon of the vocal cords with kinesie paradoxale.
Abstract: 薬効ピーク時に移動能力, ADLが向上する一方で, 一時的な起声困難が出現する若年性パーキンソン病患者を経験した.本症例の起声困難はドラッグコントロールにより改善しており, 薬物 (L-DOPA) に起因するすくみ現象による声帯の内転障害と考えられた.同一個体内で歩行と発声という異なる運動間ですくみ現象が時間的解離をもって出現した原因について考察した.結果, 同一個体内においても四肢・体幹と喉頭ですくみ現象の発現機序に相違がある可能性, あるいは薬物の治療閾値がおのおのの筋において異なる可能性を考えた.また, 本症例の起声困難には“kinesie paradoxale”を伴っており, 声帯のすくみ現象には発声を他の目標行動に変換して誘発する方法が有用であった.A 62-year-old man with juvenile Parkinson's disease was reported. When L-Dopa was working the patient felt difficulty in voicing although he could walk smoothly. Meanwhile, when L-Dopa was not working his difficulty in voicing disappeared but he was unable to walk. This discrepancy between voicing and walking is disussed.Laryngofiberscopic examination showed the following intriguing findings. When L-Dopa was working the patient's vocal cords assumed the hyperabduction position. Also, during an attempts at phonation, the vocal cords developed a tendency to adduct but were unable to. This movement seemed to correspond to a“freezing”phenomenon in walking. The adduction tendency of the vocal cords ameliorated temporazily by voluntarily making a cough instead of voicing. Such a phenomenon appeared as a freezing of vocal cord movement with kinesie paradoxale.Two hypotheses were raised to explain this “see-saw” phenomenon between voicing and walking. First, the mechanism of the freezing phenomenon might differ for voicing and walking. Second, the threshold for the effectiveness of L-Dopa might differ for the intrinsic laryngeal muscles controlling voicing and for the limb and truncal muscles controlling walking. The task of hawking which we attempted was very useful in speech therapy on PD patients who exhibited the freezing phenomenon of the vocal cords with kinesie paradoxale.

Proceedings Article
01 Jan 1997
TL;DR: A robust labelling system (AMULET) of the multi sensor ACCOR speech database is defined and two efficient Voiced/Unvoiced/Silence detectors for the acoustic signals and a third one for the laryngographic signal are presented.
Abstract: Many researchers have seen in the articulation an intermediate level of representation. In the gestural phonetic theory, units are articulatory gestures. In order to assess this theory with observed parameters, we have defined a robust labelling system (AMULET) of the multi sensor ACCOR speech database. Main articulatory gestures searched are Voice Onset and Voice Termination on both acoustic and laryngographic signals. We present here two efficient Voiced/Unvoiced/Silence detectors for the acoustic signals and a third one for the laryngographic signal.