Showing papers on "Voice published in 2002"

PDF

Open Access

Journal Article•DOI•

Acoustic and aerodynamic correlates of Korean stops and fricatives

[...]

Taehong Cho¹, Sun-Ah Jun², Peter Ladefoged²•Institutions (2)

Max Planck Society¹, University of California, Los Angeles²

01 Apr 2002-Journal of Phonetics

TL;DR: This study examines acoustic and aerodynamic characteristics of consonants in standard Korean and in Cheju, an endangered Korean language, and suggests that the fricative /s/ is better categorized as &&lenis'' rather than &&aspirated''.

...read moreread less

370 citations

Journal Article•DOI•

The contribution of consonantal and vocalic information to the perception of Korean initial stops

[...]

Mi Ryoung Kim¹, Patrice Speeter Beddor¹, Julie Horrocks¹•Institutions (1)

University of Michigan¹

01 Jan 2002-Journal of Phonetics

TL;DR: The perceptual dominance of f 0 over VOT for lax stops is consistent with the size of the f 0 differences in word- (and phrase-) initial position, as well as the prominent role of the resulting tonal patterns in Korean intonational phonology.

...read moreread less

138 citations

Journal Article•DOI•

The phonetics of phonological speech errors: An acoustic analysis of slips of the tongue

[...]

Stefan Frisch¹, Richard Wright²•Institutions (2)

University of South Florida¹, University of Washington²

01 Apr 2002-Journal of Phonetics

TL;DR: Acoustic analysis was used to examine whether speech errors involve lexical, segmental, or sub-featural errors in speech production, and provides evidence for the psychological reality of phonological segments and words as units in the speech production process.

...read moreread less

133 citations

Journal Article•DOI•

A parametric study of the spectral characteristics of European Portuguese fricatives

[...]

Luis M. T. Jesus¹, Christine H. Shadle¹•Institutions (1)

University of Southampton¹

01 Jul 2002-Journal of Phonetics

TL;DR: Portuguese fricatives were analyzed in ways designed to enhance the description of the language and to increase the understanding of the production of fricative mechanisms.

...read moreread less

128 citations

Journal Article•DOI•

Laryngeal features in German

[...]

Michael Jessen¹, Catherine Ringen²•Institutions (2)

German Criminal Police Office¹, University of Iowa²

01 Aug 2002-Phonology

TL;DR: In this article, the authors present experimental results that support the view that German has underlying [spread glottis] stops, not [voice] stops and that the intervocalic voiced stops arise because of passive voicing of the non-spreadglottis stops.

...read moreread less

Abstract: It is well known that initially and when preceded by a word that ends with a voiceless sound, German so-called ‘voiced’ stops are usually voiceless, that intervocalically both voiced and voiceless stops occur and that syllable-final (obstruent) stops are voiceless. Such a distribution is consistent with an analysis in which the contrast is one of [voice] and syllable-final stops are devoiced. It is also consistent with the view that in German the contrast is between stops that are [spread glottis] and those that are not. On such a view, the intervocalic voiced stops arise because of passive voicing of the non-[spread glottis] stops. The purpose of this paper is to present experimental results that support the view that German has underlying [spread glottis] stops, not [voice] stops.

...read moreread less

128 citations

Journal Article•DOI•

Comparing stress, lexical focus, and segmental focus: patterns of variation in Arabic vowel duration

[...]

Kenneth de Jong¹, Bushra Adnan Zawaydeh•Institutions (1)

Indiana University¹

01 Jan 2002-Journal of Phonetics

TL;DR: Results find both a quantity and a voicing effect on vowel durations, though these two effects differ as to how they interact with stress and focus.

...read moreread less

115 citations

Journal Article•DOI•

Voice quality differences associated with stops and clicks in Xhosa

[...]

Michael Jessen¹, Michael Jessen², J. C. Roux¹•Institutions (2)

Stellenbosch University¹, German Criminal Police Office²

01 Jan 2002-Journal of Phonetics

TL;DR: It is argued that extensive larynx lowering and vocal fold slackening can explain the specifics of the voicing feature in Xhosa and suggested that “slack voice” is a more appropriate term for the relevantXhosa sounds than “breathy voice’.

...read moreread less

57 citations

Journal Article•DOI•

Development of speech perception and production in children with cochlear implants.

[...]

Liat Kishon-Rabin¹, Riki Taitelbaum, Chava Muchnik, Inbal Gehtler, Jona Kronenberg, Minka Hildesheimer - Show less +2 more•Institutions (1)

Tel Aviv University¹

01 May 2002-The Annals of otology, rhinology & laryngology. Supplement

TL;DR: The results show that auditory speech perception performance of children with cochlear implants reaches an asymptote at 76% between 4 and 6 years of implant use, and the hierarchy in speech pattern contrast perception and production was similar between the implanted and normal-hearing children.

...read moreread less

Abstract: The purpose of the present study was twofold: 1) to compare the hierarchy of perceived and produced significant speech pattern contrasts in children with cochlear implants, and 2) to compare this hierarchy to developmental data of children with normal hearing. The subjects included 35 prelingual hearing-impaired children with multichannel cochlear implants. The test materials were the Hebrew Speech Pattern Contrast (HeSPAC) test and the Hebrew Picture Speech Pattern Contrast (HePiSPAC) test for older and younger children, respectively. The results show that 1) auditory speech perception performance of children with cochlear implants reaches an asymptote at 76% (after correction for guessing) between 4 and 6 years of implant use; 2) all implant users perceived vowel place extremely well immediately after implantation; 3) most implanted children perceived initial voicing at chance level until 2 to 3 years after implantation, after which scores improved by 60% to 70% with implant use; 4) the hierarchy of phonetic-feature production paralleled that of perception: vowels first, voicing last, and manner and place of articulation in between; and 5) the hierarchy in speech pattern contrast perception and production was similar between the implanted and the normal-hearing children, with the exception of the vowels (possibly because of the interaction between the specific information provided by the implant device and the acoustics of the Hebrew language). The data reported here contribute to our current knowledge about the development of phonological contrasts in children who were deprived of sound in the first few years of their lives and then developed phonetic representations via cochlear implants. The data also provide additional insight into the interrelated skills of speech perception and production.

...read moreread less

54 citations

Journal Article•DOI•

On vowel height and consonantal voicing effects: data from Italian.

[...]

Anna Esposito¹•Institutions (1)

Wright State University¹

01 Oct 2002-Phonetica

TL;DR: The aim is to provide a comprehensive account of all the various interactions between consonantal voicing, vowel height and consonant place on the above acoustic attributes in order to propose an explanation for such effects, and to compare the present results and interpretations with previous explanations

...read moreread less

Abstract: This paper reports an acoustic study of CV sequences in Italian (where C is /b, d, g, p, t, k/ and V is one of the seven Italian vowels in stressed position). It explores the effects of vowel height, consonantal voicing, and place of articulation on a number of acoustic attributes of vowels (duration, f(0), F(1)), and on the duration of the preceding stop closure, VOT and RVOT (defined as the interval from C release to the acoustic vowel onset). The aim is to provide, for Italian, a comprehensive account of all the various interactions between consonantal voicing, vowel height and consonant place on the above acoustic attributes in order to propose an explanation for such effects, and to compare the present results and interpretations with previous explanations and with previous data on Italian and other languages.

...read moreread less

51 citations

Gaps in factorial typology: The case of voicing in consonant clusters

[...]

Scott Myers¹•Institutions (1)

University of Texas at Austin¹

01 Jan 2002

TL;DR: In this paper, it is argued that the attested rankings correspond to patterns that arise through simple sound changes from phonetic patterns, while the unattested rankings cannot have such an origin.

...read moreread less

Abstract: In many languages, a voiceless obstruent cannot occur after a nasal. In some languages such a sequence is avoided through voicing assimilation, but in others through deletion of the nasal or of the obstruent, or changing the nasal into an oral stop. This variety of responses can be expressed in OT through different constraint rankings. Other ways of avoiding such a sequence, such as epenthesis or metathesis, are not attested. These patterns correspond to rankings that are unattested. It is shown that such gaps in factorial typology are pervasive, and cannot be addressed through ad hoc revisions of the constraints or representations. It is argued that the attested rankings correspond to patterns that arise through simple sound changes from phonetic patterns, while the unattested rankings cannot have such an origin. This approach suggests phonetics influences not only markedness in phonology, but also constraint ranking.

...read moreread less

49 citations

Journal Article•DOI•

American and Swedish children's acquisition of vowel duration: Effects of vowel identity and final stop voicing

[...]

Eugene H. Buder¹, Carol Stoel-Gammon•Institutions (1)

University of Memphis¹

03 Apr 2002-Journal of the Acoustical Society of America

TL;DR: It is confirmed that Swedish children show an early tendency to vary vowel durations according to final consonant voicing, followed only six months later by a stage at which the intrinsic influence of vowel identity grows relatively more robust.

...read moreread less

Abstract: Vowel durations typically vary according to both intrinsic (segment-specific) and extrinsic (contextual) specifications. It can be argued that such variations are due to both predisposition and cognitive learning. The present report utilizes acoustic phonetic measurements from Swedish and American children aged 24 and 30 months to investigate the hypothesis that default behaviors may precede language-specific learning effects. The predicted pattern is the presence of final consonant voicing effects in both languages as a default, and subsequent learning of intrinsic effects most notably in the Swedish children. The data, from 443 monosyllabic tokens containing high-front vowels and final stop consonants, are analyzed in statistical frameworks at group and individual levels. The results confirm that Swedish children show an early tendency to vary vowel durations according to final consonant voicing, followed only six months later by a stage at which the intrinsic influence of vowel identity grows relatively more robust. Measures of vowel formant structure from selected 30-month-old children also revealed a tendency for children of this age to focus on particular acoustic contrasts. In conclusion, the results indicate that early acquisition of vowel specifications involves an interaction between language-specific features and articulatory predispositions associated with phonetic context.

...read moreread less

Patent•DOI•

Speech converter utilizing preprogrammed voice profiles

[...]

Ning Bi¹, Andrew P. Dejaco¹•Institutions (1)

Qualcomm¹

19 Feb 2002-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain).

...read moreread less

Abstract: A speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts. Initially, the speech converter receives a formants signal representing an input speech signal and a pitch signal representing the input signal's fundamental frequency. One or both of the following may also be received: a voicing signal comprising an indication of whether the input speech signal is voiced, unvoiced, or mixed, and/or a gain signal representing the input speech signal's energy. The speech converter also receives user selection of one of multiple preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain). The speech converter modifies at least one of the formants, voicing, pitch, and/or gain signals as specified by the selected voice font.

...read moreread less

Journal Article•DOI•

Acoustic Cues to the Voicing Feature in Tracheoesophageal Speech

[...]

Jeff Searl¹, Mary A. Carpenter²•Institutions (2)

Bowling Green State University¹, University of Kansas²

01 Apr 2002-Journal of Speech Language and Hearing Research

TL;DR: Three of the four acoustic measures contributed significantly to the discriminant models that differentiated accurately perceived TE and laryngeal samples, and the values for each measure were higher/longer for the TE group.

...read moreread less

Abstract: Tracheoesophageal (TE) speakers often have difficulty producing the voiced/ voiceless distinction. This limitation has been attributed to use of the pharyngoe-sophageal segment as the phonatory sou...

...read moreread less

Patent•

Method and apparatus for improved voicing determination in speech signals containing high levels of jitter

[...]

Ari Heikkinen¹•Institutions (1)

Nokia¹

05 Apr 2002

TL;DR: In this paper, a method is presented to minimize the effect of pitch jitter in voicing determination of sinusoidal speech coders during voiced speech, where the pitch of the input signal is normalized to a fixed value prior voicing determination in the analysis frame.

...read moreread less

Abstract: In an embodiment of the invention, a method is presented to minimize the effect of pitch jitter in voicing determination of sinusoidal speech coders during voiced speech. In the method, the pitch of the input signal is normalized to a fixed value prior voicing determination in the analysis frame. After that, conventional voicing determination approaches can be used for the normalized signal. Based on experiments done, the method has been shown to improve the performance of sinusoidal speech coders during jittery voiced speech by increasing the accuracy of voicing classification decisions of speech signals.

...read moreread less

Journal Article•DOI•

Markedness and laziness in Spanish obstruents

[...]

Carlos-Eduardo Piñeros¹•Institutions (1)

University of Iowa¹

01 May 2002-Lingua

TL;DR: By allowing co-articulation, speech rate and register to play a role in the sound component of the grammar, this analysis accounts for the gradiency, and great deal of cross dialectal and dialect internal variation exhibited by Spanish spirantization.

...read moreread less

Journal Article•DOI•

Use of voicing features in HMM-based speech recognition

[...]

David Lynn Thomson¹, Rathinavelu Chengalvarayan¹•Institutions (1)

Alcatel-Lucent¹

01 Jul 2002-Speech Communication

TL;DR: Investigation of speech recognition features related to voicing functions that indicate whether the vocal folds are vibrating shows that voicing features and spectral information are complementary and that improved speech recognition performance is obtained by combining the two sources of information.

...read moreread less

Journal Article•DOI•

Phonetic variation in voiced obstruents in North-Central Peninsular Spanish

[...]

Carolina González¹•Institutions (1)

University of Southern California¹

01 Jun 2002-Journal of the International Phonetic Association

TL;DR: The authors examined whether or not stress is a factor in the likelihood of frication and devoicing of coda /b, d, [vvplosive]/ in Spanish dialects.

...read moreread less

Abstract: In Spanish, /b, d, [vvplosive]/ are usually spirantized to voiced approximants in all syllabic contexts after a continuant sound. However, in North-Central Peninsular Spanish (NCS), spirantization interacts with coda devoicing, yielding voiceless fricatives. In the majority of cases, coda /b, d, [vvplosive]/ occur in stressed syllables. This work examines whether or not stress is a factor in the likelihood of frication and devoicing of coda /b, d, [vvplosive]/ in this dialect. An acoustic study was conducted of nine native speakers from NCS. These speakers were tested on nonce words with /b, d, [vvplosive]/ in coda position in both stressed and unstressed syllables. Measurements were made of vowel and consonant duration, presence and absence of frication and voicing, and voicing duration. The results show that frication is more likely in stressed syllables than in unstressed syllables. This suggests that in stressed syllables, a higher subglottal pressure produces higher airflow across the glottis, thereby favoring frication. In turn, frication inhibits voicing due to conflicting aerodynamic requirements between the two. We conclude that stress is a factor in spirantization and that it may indirectly affect the voicing properties of /b, d, [vvplosive]/.

...read moreread less

Journal Article•DOI•

The role of sound intensity and stop-consonant voicing on McGurk fusions and combinations

[...]

Cécile Colin, Monique Radeau, Paul Deltenre, Didier Demolin, Alain Soquet - Show less +1 more

01 Oct 2002-European Journal of Cognitive Psychology

TL;DR: In this paper, the authors investigated the effect of audio infongruent monosyllables in French for both voiced and voiceless stop consonants, using two levels of auditory intensity (70 dB vs 40 dB).

...read moreread less

Abstract: When presented with an auditory /b/ dubbed onto a visual /g/, listeners sometimes perceive a fused phoneme like /d/ while with the reverse presentation, they experience a combination such as /bg/. This phenomenon reported by McGurk and MacDonald (1976) is here investigated in French for both voiced and voiceless stop consonants, using two levels of auditory intensity (70 dB vs 40 dB). In a first experiment, audiovisual incongruent monosyllables (A/bi/ V/gi/, A/gi/ V/bi/, A/ki/ V/pi/, A/pi/ V/ki/) uttered by a man and by a woman speaker were recorded and dubbed, using an analogical technology. In a second experiment, the same syllables articulated by the man speaker were recorded and dubbed according to digital technology. In a third experiment, the same materials as in the second experiment were used but the presentation procedure of the experimental items was changed: Audiovisual incongruent trials were mixed up with congruent ones. In the three experiments, the role of voicing and of auditory intensity ...

...read moreread less

Proceedings Article•

Speech recognition using fundamental frequency and voicing in acoustic modeling.

[...]

Andrej Ljolje

01 Jan 2002

TL;DR: This work attempts to directly exploit prosodic correlates in acoustic modeling of speech for large vocabulary recognition by comparing two methods for using the fundamental frequency and voicing parameters.

...read moreread less

Abstract: Prosody has long been studied as a knowledge source in speech processing. We attempt to directly exploit prosodic correlates in acoustic modeling of speech for large vocabulary recognition. We compare two methods for using the fundamental frequency and voicing parameters. The more complex approach starts by modeling prosodic classes and using a representation of their recognized sequences as acoustic features. The simpler approach simply adds suitably normalized raw values to the conventional mel cepstral coefficients in the observation vectors. The simpler approach achieves modest accuracy gains on HUB-5 Eval-2001 test set.

...read moreread less

Proceedings Article•DOI•

Low bit and variable rate speech coding using local cosine transform

[...]

Dong En-qing¹, Zhao Heming¹, Li Yongli•Institutions (1)

Soochow University (Suzhou)¹

28 Oct 2002

TL;DR: The evaluation using subject informal listening tests and a few object parameters indicates that the speech quality (intelligibility and naturalness) of the designed speech coder is better than that of the FS1015 standard coder.

...read moreread less

Abstract: An average16kb/s low bit and variable rate speech coder based on local cosine transform (LCT) algorithm for a two-way conversational speech is designed for the first time in the paper The result of the voice activity detector (VAD) based on support vector machine (SVM) and the classification method of the voicing modes of the GSM half rate standard for active speech are adopted in the design of the variable bit rate coder The moderately voiced mode and the strongly voiced mode of the voicing modes are combined as a voicing mode, the new combined voicing mode is named as a moderately and strongly voiced mode A few segment vector quantizers of the LCT coefficients for each voicing mode and silence voicing frame (background noise) are employed, and LGB algorithm is applied to design the codebooks A tree fast search technique is used to select the vector of the LCT coefficients for each segment The evaluation using subject informal listening tests and a few object parameters indicates that the speech quality (intelligibility and naturalness) of the designed speech coder is better than that of the FS1015 standard coder The new coder has higher robust than the FS1015 standard coder, which is suitable for speech coding in any environments

...read moreread less

Journal Article•DOI•

An acoustic study of contrasting plosives and click accompaniments in Xhosa.

[...]

Michael Jessen¹•Institutions (1)

Stellenbosch University¹

01 Sep 2002-Phonetica

TL;DR: The phonetic manifestation of distinctive plosive types and click accompaniments in Xhosa was investigated with measurements of voice onset time (VOT), closure duration, voicing during closure, and burst amplitude.

...read moreread less

Abstract: The phonetic manifestation of distinctive plosive types and click accompaniments in Xhosa was investigated with measurements of voice onset time (VOT), closure duration, voicing during closure, and burst amplitude. There is a high degree of interspeaker as well as token-to-token variability in the voiceless unaspirated plosives and clicks concerning their pronunciation with or without audible ejection. The plosives are much more frequently ejective than the corresponding clicks. If present, ejection is manifested by increased VOT, burst amplitude, or both. Duration of voicing during closure is substantial only in the implosive, but not in the 'voiced' plosives and clicks. After nasals the percentage of voicing during closure is high in 'voiced' plosives due to the very short closure duration found in that context; in the post-nasal 'voiced' clicks closure is mostly reduced to zero. Aspirated plosives and clicks in Xhosa show VOT values that are on average relatively long when compared to other languages. Closure duration tends to be shorter in aspirated plosives and clicks than in other categories.

...read moreread less

Robust ARX speech analysis method taking voicing source pulse train into account

[...]

Takahiro Ohtsuka, Hideki Kasuya

01 Jan 2002

TL;DR: A new automatic analysis method based on a speech pro- duction process expressed by an autoregressive model with an exogenous input (ARX model) and a mathematical voicing source model is proposed with the aim of establishing a method of analyzing voicing source and formant parameters with high accuracy for the speech of not only males but also females and children.

...read moreread less

Abstract: In this study, we propose a new automatic analysis method based on a speech pro- duction process expressed by an autoregressive model with an exogenous input (ARX model) and a mathematical voicing source model, with the aim of establishing a method of analyzing voicing source and formant parameters with high accuracy for the speech of not only males but also females and children. The features of the proposed method are as follows. 1) The for- mant parameters of high-pitched speech can be stably estimated by placing voicing source pulse trains that correspond to multiple pitches in the analysis window. 2) Low formant frequencies, which are easily affected by voicing source characteristics, can be estimated with high accuracy. 3) The estimation error caused by the incompleteness of the model can be reduced by introducing a prefilter that takes into account the spectral tilt of the voicing source.

...read moreread less

Journal Article•DOI•

The Weight of Phonetic Substance in the Structure of Sound Inventories

[...]

Nathalie Vallée, Louis-Jean Boë, Jean-Luc Schwartz, Pierre Badin, Christian Abry - Show less +1 more

01 Jan 2002

TL;DR: This article proposed Dispersion-Focalization Theory (DFT) to predict vowel systems using two competing perceptual constraints weighted with two parameters, respectively λ and α, namely increasing auditory distances between vowel spectra (dispersion) and increasing the perceptual salience of each spectrum through formant proximities (focalisation).

...read moreread less

Abstract: In the research field initiated by Lindblom & Liljencrants in 1972, we illustrate the possibility of giving substance to phonology, predicting the structure of phonological systems with nonphonological principles, be they listener-oriented (perceptual contrast and stability) or speaker-oriented (articulatory contrast and economy). We proposed for vowel systems the Dispersion-Focalisation Theory (Schwartz et al., 1997b). With the DFT, we can predict vowel systems using two competing perceptual constraints weighted with two parameters, respectively λ and α. The first one aims at increasing auditory distances between vowel spectra (dispersion), the second one aims at increasing the perceptual salience of each spectrum through formant proximities (focalisation). We also introduced new variants based on research in physics - namely, phase space (λ,α) and polymorphism of a given phase, or superstructures in phonological organisations (Vallee et al., 1999) which allow us to generate 85.6% of 342 UPSID systems from 3- to 7-vowel qualities. No similar theory for consonants seems to exist yet. Therefore we present in detail a typology of consonants, and then suggest ways to explain plosive vs. fricative and voiceless vs. voiced consonants predominances by i) comparing them with language acquisition data at the babbling stage and looking at the capacity to acquire relatively different linguistic systems in relation with the main degrees of freedom of the articulators; ii) showing that the places “preferred” for each manner are at least partly conditioned by the morphological constraints that facilitate or complicate, make possible or impossible the needed articulatory gestures, e.g. the complexity of the articulatory control for voicing and the aerodynamics of fricatives. A rather strict coordination between the glottis and the oral constriction is needed to produce acceptable voiced fricatives (Mawass et al., 2000). We determine that the region where the combinations of Ag (glottal area) and Ac (constriction area) values results in a balance between the voice and noise components is indeed very narrow. We thus demonstrate that some of the main tendencies in the phonological vowel and consonant structures of the world’s languages can be explained partly by sensorimotor constraints, and argue that actually phonology can take part in a theory of Perception-for-Action-Control.

...read moreread less

Journal Article•DOI•

Underlying mechanism for categorical perception: tone-onset time and voice-onset time evidence of Hebrew voicing.

[...]

Liat Kishon-Rabin¹, Shira Rotshtein, Riki Taitelbaum•Institutions (1)

Sheba Medical Center¹

01 Jan 2002-Journal of basic and clinical physiology and pharmacology

TL;DR: The data support the influence of both general auditory abilities and unique speech processes on categorical perception of speech and different category boundaries for speech and non-speech stimuli in Hebrew and across languages.

...read moreread less

Abstract: The nature of the mechanism responsible for the categorical labeling of stimuli is not clear. One hypothesis suggests that categorization is limited by the 'natural sensitivities' of the auditory system. The alternative hypothesis suggests that categorization is mediated by a special speech mode and is influenced by how speech is produced. The present study attempts to provide some insight into this dilemma by evaluating categorical perception (CP) in speech and non-speech stimuli and across languages. Specifically, the goals of the present study were (1) to compare phonetic boundaries of Hebrew voicing to categorical boundaries (CB) of a two-tone complex which varies in the relative timing of the two tones (TOT) [TOT stimuli are considered be the non-speech analog to voice-onset time (VOT)], and (2) to re-establish the CB values of non-speech analog to voicing in American-English speakers using the same TOT continua as the Hebrew speakers and to compare them to CB of Hebrew-speaking subjects. Our assumption was that if CP is mediated by basic auditory sensitivity then we expect similar CB for speech and non-speech stimuli and no effect of language on CB. If, however, a special speech code determines CP, then phonetic boundaries are expected to be different from CB of non-speech stimuli and across languages. Of particular interest is the special case of Hebrew whose voice-voiceless distinction in production is very different from that in English. Twelve Hebrew-speaking adults and 12 American-English speaking adults participated in this study. Stimuli consisted of (a) a two-tone complex continuum that varied in the relative onset time of the lower tone from a lead of -100 ms to a lag of +50 ms in 10 ms steps, and (b) a /ba-pa/ continuum which varied in VOT values similar to (a). Subjects identified TOT stimuli as belonging to one of three categories: leading, simultaneous, or lagging. VOT stimuli were labeled as /ba/ or /pa/. Results show (a) different phonetic boundary for Hebrew voicing compared to published data on English voicing, (b) different category boundaries for speech and non-speech stimuli in Hebrew, (c) a phonetic boundary for Hebrew voicing that does not align with the VOT values of production, and (d) very similar CB for TOT stimuli in Hebrew- and American-English-speaking subjects. The data support the influence of both general auditory abilities and unique speech processes on categorical perception of speech.

...read moreread less

Turkish /h/ deletion: evidence for the interplay of speech perception and phonology

[...]

Jeff Mielke¹•Institutions (1)

Ohio State University¹

01 Jan 2002

TL;DR: Steriade et al. as mentioned in this paper found that sounds which are less perceptible are more likely to be altered than more salient sounds, the rationale being that the loss of information resulting from a change in a sound which is difficult to perceive is not as great as the loss resulting from the change in more salient sound.

...read moreread less

Abstract: It has been hypothesized that sounds which are less perceptible are more likely to be altered than more salient sounds, the rationale being that the loss of information resulting from a change in a sound which is difficult to perceive is not as great as the loss resulting from a change in a more salient sound. Kohler (1990) suggested that the tendency to reduce articulatory movements is countered by perceptual and social constraints, finding that fricatives are relatively resistant to reduction in colloquial German. Kohler hypothesized that this is due to the perceptual salience of fricatives, a hypothesis which was supported by the results of a perception experiment by Hura, Lindblom, and Diehl (1992). These studies showed that the relative salience of speech sounds is relevant to explaining phonological behavior. An additional factor is the impact of different acoustic environments on the perceptibility of speech sounds. Steriade (1997) found that voicing contrasts are more common in positions where more cues to voicing are available. The P-map, proposed by Steriade (2001a, b), allows the representation of varying salience of segments in different contexts. Many researchers have posited a relationship between speech perception and phonology. The purpose of this paper is to provide experimental evidence for this relationship, drawing on the case of Turkish /h/ deletion.

...read moreread less

Proceedings Article•DOI•

An event-based acoustic-phonetic approach for speech segmentation and E-set recognition

[...]

Amit Juneja¹, Om D. Deshmukh¹, Carol Y. Espy-Wilson¹•Institutions (1)

University of Maryland, College Park¹

13 May 2002

TL;DR: The knowledge-based acoustic parameters (APs) optimized within the EBS framework were compared to the mel-frequency cepstral coefficients in an HMM-based recognition system and showed that the APs achieve a higher recognition accuracy.

...read moreread less

Abstract: In this paper, we discuss an event-based recognition system (EBS) which is based on phonetic feature theory and acoustic phonetics. First, acoustic events related to the manner phonetic features are extracted from the speech signal. Second, based on the manner acoustic events, information related to the place phonetic features and voicing are extracted. Most recently, we focused on place and voicing information needed to distinguish among the stop consonants /t,d,p,b/. Using the E-set utterances from the TI46 database, EBS achieved 75.7% overall word accuracy. Further, the knowledge-based acoustic parameters (APs) optimized within the EBS framework were compared to the mel-frequency cepstral coefficients in an HMM-based recognition system. The results on the E-set task showed that the APs achieve a higher recognition accuracy.

...read moreread less

Dissertation•

De la syllabation en termes de contours CV.

[...]

Joaquim Brandão de Carvalho

23 Nov 2002

TL;DR: In this paper, it is argued that voice, and, more generally, all features associated with "voice onset time" (VOT) are not segmental features; rather, VOT-values and length contrasts are assigned similar representations.

...read moreread less

Abstract: In all phonological models of syllable structure, 'sonority', and, in particular, one of its main correlates — voice(lessness) — are intrinsic properties of segments, as opposed, for example, to length, which also plays a major role in syllable stucture, and was shown to be a prosodic effect by autosegmental phonology, thanks to the notion of skeletal positions and the Obligatory Contour Principle. This has particular importance today, since the segmental nature of sonority may naturally be viewed as evidence for 'output-based' and non-representational approaches to the syllable. The basic claim here is that voice, and, more generally, all features associated with 'voice onset time' (VOT) — voice, voicelessness and aspiration (henceforth VOT-values) — are not segmental features ; rather, VOT-values and length contrasts are to be assigned similar representations. It is proposed that phonological words are characterized by two parallel curves which follow from the association with the skeleton of two autonomous and antinomic tiers : the O-tier, where 'onsets' are the roots of consonants, is supposed to stand for (articulatory) 'tension' ; the N-tier, where 'nuclei' are the roots of vowels, represents (perceptual) 'sonority'. VOT-values and length contrasts are, as it were, contextual allophones of such abstract invariants : aspiration and voice emerge from O-spreading to the following N-slot, and from N-spreading to the preceding O-slot respectively ; consonantal and vocalic length results from O-spreading to the preceding N-slot, and from N-spreading to the following O-slot respectively. The representation of VOT-values and length in terms of O/N interactions provides a simple and straightforward solution to six problems at least : (a) why can no segment contain the sole 'feature' [voiced] or [aspirated] ? ; (b) why do gemination and voice behave as the poles of the same 'strength scale' ? ; (c) why are voice contrasts much more frequent among consonants than among vowels ? ; (d) why is compensatory lengthening impossible before vowel ? ; (e) why are both initial aspiration and final voicing 'edge-specific' marked phenomena ? ; (f) why does voicing normally take place in intervocalic position, but fails to occur either word-initially or after coda ? Finally, voicing and vowel lengthening are shown to be alternative lenition strategies. Beyond its explanatory power, the hypothesis of O/N interactions has an important issue on cognitive grounds. By denying any symbolic status to aspiration and voice, we are led to reduce the number of segmental primitives. By assuming that both VOT-values and length contrasts are segmental effects of onset and nucleus weight, defined as the number of slots onsets and nuclei are associated with, we are assigning a representational basis to syllables : 'syllables' exist wherever VOT and/or length contrasts may emerge. This runs counter the claims of output-based approaches, where syllables emerge from smaller units. A contrario, the present theory is likely to lend phonological support to quite independently grounded ideas, since based on brain studies, like MacNeilage's distinction between frame and content. In particular, the assumed autonomy of syllabic structure, i.e. of VOT/length, vis-a-vis segmental material proper is consonant with "the idea that speech production branches into metrical and segmental processes, and that syllabic frames are conceptually separable from their phonemic content".

...read moreread less

Journal Article•DOI•

Conditions for the voicing of Old English fricatives, II: morphology and syllable structure

[...]

Robert D. Fulk¹•Institutions (1)

Indiana University¹

01 May 2002-English Language and Linguistics

TL;DR: In this paper, the authors examined whether a voiced or a voiceless fricative in OE compounds and quasi-compounds conform to the rule of voicing between voiced sounds that applies morpheme-internally.

...read moreread less

Abstract: Old English fricatives at points of morpheme juncture are studied to determine whether they conform to the rule of voicing between voiced sounds that applies morpheme-internally. Should we expect a voiced or a voiceless fricative in words like OE heorð-weorod, Wulfweard, and stīðlīce? The evidence examined regards chiefly compounds and quasi-compounds (the latter comprising both forms bearing clear derivational affixes and ‘obscured’ compounds, those in which the deuterotheme has lost its lexical independence), though a small amount of evidence in regard to voicing before inflectional suffixes is considered. Evidence is derived from place-names, personal names, and common nouns, on the basis of Modern English standard pronunciation, assimilatory changes in Old English, modern dialect forms, post-Conquest and nonstandard Old English spellings, and analogous conditioning for the loss of OE /x/. A considerable preponderance of the evidence indicates that in compounds as well as in quasi-compounds, fricatives were voiced at the end of the prototheme when a voiced sound followed, but not a voiceless one. It follows from the evidence that there was no general devoicing of fricatives in syllable-final position in Old English, despite Anglo-Saxon scribes' use of for etymological [Γ] in occasional spellings like and . Old English spellings of this kind need be taken to imply nothing more than a tendency for and to be used interchangeably in noninitial positions, due to the noncontrastive distribution of the sounds they represent everywhere except morpheme-initially. Rare early Middle English spellings of this kind may or may not have a phonological basis, but they cannot plausibly be taken to evidence a phonological process affecting /v, ð, z/.

...read moreread less

DOI•

Durational variability in Vowel-Consonant-Vowel sequences in Greek: The influence of phonetic identity, context and speaker

[...]

Katerina Nicolaidis

11 Apr 2002

TL;DR: In this article, the authors investigated the influence of phonetic identity of the segments involved, context and speaker on the durational variability in Vowel-Consonant-Vowel (VCV) sequences.

...read moreread less

Abstract: The paper investigates durational variability in Vowel-Consonant-Vowel (VCV) sequences, due to factors such as the phonetic identity of the segments involved, the context and the speaker. The speech material consisted of Greek real words containing VCV sequences with V=/i,a/ and C=/p,t,s/. The influence of the three factors on the durations of Vowel 1 (VI), Vowel 2 (V2), Consonant (C), and of the total VCV sequences are examined. In addition, further analyses are reported for VOT and VTT (voice termination time, i.e., carryover voicing during the consonant). The results show significant influence of all three parameters on the durations examined. The empirical findings are discussed with reference to relevant literature.

...read moreread less

Journal Article•DOI•

A “Self-Voicing” Test for Individuals with Visual Impairments:

[...]

Eric G. Hansen¹, Moon J. Lee², Douglas C. Forer¹•Institutions (2)

Educational Testing Service¹, Washington State University²

01 Apr 2002-Journal of Visual Impairment & Blindness

TL;DR: Hansen et al. as mentioned in this paper examined the use of a prototype testing system that employs synthesized speech to deliver questions on reading and listening comprehension tests for individuals with visual impairments.

...read moreread less

Abstract: A \"Self-Voicing\" Test for Individuals with Visual Impairments Eric G. Hansen, Moon J. Lee, Douglas C. Forer For test takers who are visually impaired (that is, are blind or have low vision), using human readers during tests can have several disadvantages. The problems may include an inconsistent quality of reading, the test taker's anxiety and embarrassment at having the reader reread the material, the reader's mistakes in recording answers, fatigue caused by the slowness and intensity of the reader/test-taker interaction, and a greater need for extra testing time. A computer-based testing system that is operable by keyboard input and speech output (synthesized and/or prerecorded) may reduce or eliminate the need for a human reader for some test takers who are visually impaired. This study examined the use of a prototype testing system that employs synthesized speech to deliver questions on reading and listening comprehension tests. The system is termed a \"self-voicing\" test because it provides the speech output capability within a testing application itself, rather through the use of a distinct assistive technology, such as screen reader software. With funding from the Educational Testing Service (ETS), the Graduate Record Examinations (GRE) program, and the Test of English as a Foreign Language (TOEFL) program, researchers investigated the use of speech output technology for tests for individuals with visual impairments.

...read moreread less