scispace - formally typeset
Search or ask a question

Showing papers on "Voice published in 1993"


Journal ArticleDOI
TL;DR: The voiced/voiceless distinction for English utterance-initial stop consonants is primarily realized as differences in the voice onset time, which is largely signaled by the time between the stop burst and the onset of voicing.
Abstract: The voiced/voiceless distinction for English utterance‐initial stop consonants is primarily realized as differences in the voice onset time (VOT), which is largely signaled by the time between the stop burst and the onset of voicing. The voicing of stops has also been shown to affect the vowel’s F0 after release, with voiceless stops being associated with higher F0. When the VOT is ambiguous, these F0 ‘‘perturbations’’ have been shown to affect voicing judgments. This is to be expected of what can be considered a redundant feature, that is, that it should carry a distinction in cases where the primary feature is neutralized. However, when the voicing judgments were made as quickly as possible, an inappropriate F0 was found to slow response time even for unambiguous VOTs. This was true both of F0 contours and level F0 differences. These results reinforce the plausibility of tonogenesis, and they add further weight to the claim that listeners make full use of the signal given to them, even when overt labeling would seem to indicate otherwise.

124 citations


Journal ArticleDOI
TL;DR: In this paper, the use of spectral information at vowel onset, which constitutes a stronger cue to the voicing contrast in English than in French, was investigated in French-English bilinguals in order to determine whether the primary language in terms of early experience determines acoustic cue weighting.
Abstract: The use of spectral information at vowel onset, which constitutes a stronger cue to the voicing contrast in English than in French, was investigated in French-English bilinguals in order to determine whether the primary language in terms of early experience determines acoustic cue weighting. The /pen/-/ben/ minimal pair, meaningful in both languages, was used as a base for identification tests, which were presented with either an English or a French precursor word before each token. Two stimulus continua, formed of digitally-edited natural speech tokens, had an identical VOT range but varied in their [en] stem. In their production of the contrast, bilinguals showed clear evidence of code-switching but did not always produce monolingual-like VOTs in their weaker language. In perception, the code-switching effect was significant but small. The bilingual group with English as primary early language showed a greater effect of vowel onset characteristics, in conflictingcue conditions, than the bilingual group ...

117 citations


Journal ArticleDOI
TL;DR: Four experiments addressing the role of attention in phonetic perception indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full impact on phonetic labeling, while weaker acoustic cues (FO onset frequency and vowel duration) achieve theirFull impact on speech perception without close attention.

90 citations


Journal ArticleDOI
TL;DR: The speech of 7 children with phonological disorders was analyzed for imperceptible acoustic distinctions for seemingly homophonous word pairs and a shorter treatment period was observed for subjects attributed to have productive knowledge of the contrast being trained, as compared with those who had no knowledge.
Abstract: The speech of 7 children with phonological disorders (4 who failed to produce an initial voicing contrast for stops and 3 who failed to produce the alveolar-velar stop contrast) was analyzed for imperceptible acoustic distinctions for seemingly homophonous word pairs. Subjects were audio/video recorded before and during treatment as they produced minimal pairs containing their error and correct sounds. Acoustic measures were VOT and CV locus equations. The presence of acoustic distinctions was taken as evidence for productive knowledge of the sound contrasts. Treatment was applied experimentally and progress was related to pretreatment productive knowledge inferred from acoustic distinctions. A shorter treatment period was observed for subjects attributed to have productive knowledge of the contrast being trained, as compared with those who had no knowledge. One of the 4 subjects with initial voicing errors produced an acoustic distinction between voiced and voiceless stops and required the shortest treat...

72 citations


Journal ArticleDOI
TL;DR: The contextual effects of voiced/voiceless stops on the voice source of an adjacent vowel were examined for the first vowel in ‘CVCV utterances in German, English, Swedish, French, and Italian to yield insights into the control parameters which may be involved in regulating voicing oppositions in these languages.
Abstract: The contextual effects of voiced/voiceless stops on the voice source of an adjacent vowel were examined for the first vowel in 'CVCV utterances in German, English, Swedish, French, and Italian. The principal analysis technique involved interactive inverse filtering and parameterisation of the glottal waveform in terms of a four-parameter voice source model (the LF-model). This analysis procedure was supplemented by measures from narrow-band spectral sections of the speech output and by oral airflow recordings which allow inferences about the relative timing of glottal and supraglottal gestures. Results indicated that the voiced/voiceless nature of the consonant does yield differences in the voice source of the vowel. The most striking effects were found in the context of voiceless consonants, and cross-language differences did emerge in terms of directionality and degree. Extensive anticipatory effects were found for Swedish and for some speakers of English. Preceding the voiceless stop the vowel becomes increasingly breathy-voiced, and it would appear that the glottal abduction gesture is anticipated very early in the course of the vowel. Italian exhibited a similar tendency, though to a considerably lesser degree. The German data, on the other hand, showed certain strong carryover effects: Following the voiceless aspirated stop there was extensive breathy voicing. French showed little contextual variation in either direction. Rather surprisingly, the observed effects were not directly correlated with, or predictable from, the phonetic categories involved (voiced, voiceless unaspirated, and voiceless postaspirated). These results yield insights into the control parameters which may be involved in regulating voicing oppositions in these languages. Whereas the anticipatory effects observed might be consistent with a "timing" model of glottal control, the carryover effects cannot be explained in terms of timing alone and suggest that differences in tension settings of the laryngeal musculature may also be implicated.

54 citations


Journal ArticleDOI
TL;DR: The results suggest that neutralization occurs when semantic information is present, but that a voicing contrast is realized when it is absent.
Abstract: The present study examined regressive voice assimilation in Catalan in an attempt to determine a systematic explanation of complete versus incomplete voicing neutralization. Two types of contexts were constructed. In one type, semantic information was present to bias the meaning of target words. In the other type, no semantic information was present. The results showed that vowel duration distinguished underlying voicing in the neutral context only. The results suggest that neutralization occurs when semantic information is present, but that a voicing contrast is realized when it is absent.

53 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the basis of language set effects for short-lag voice onset time (VOT) voicelessness in Spanish/English perceptual sets and find that the effect of VOT was not an overriding cue to the voicing feature for them. But they did not reveal acoustic dimensions that would reliably differentiate the shortlag Spanish /t/ tokens that were predominantly identified as "t" from those that were ambiguous between " t" and "d".

51 citations


Journal ArticleDOI
TL;DR: It was concluded that voicing sibilant phonemes, or word sounds, does cause the subject to adopt the CSS, and that a single sIBilant word sound does not give a reliable indication of the smallest speaking vertical dimension.
Abstract: The purpose of this investigation was to determine whether the production of sibilant sounds involved adopting a jaw position that corresponded to the closest vertical speaking space (CSS), by analysis of the smallest vertical excursion of the mandible during the performance of different phonetic exercises. A further objective was to establish the variability in the CSS produced by individual sibilant phonemes. Thirty young adult subjects had their CSS determined during three separate phonetic tests, using a kinesiograph (Sirognathograph, Siemens A.G., Benshiem, Germany) and a Bio-Pak (BioResearch Associates Inc., Milwaukee, WI) jaw-tracking software program. The first test was a general phonetic articulation test containing all the sounds of the English language and specifically including all six sibilant word sounds. The second phonetic test contained the six sibilant sound making up a short sentence. The third test included six single words, each expressing a different sibilant sound. No statistically significant difference among the mean CSS determined in each of three exercises was demonstrable. A phonetic test containing all sibilant sounds produced a CSS equivalent to that of a test containing all speech sounds. The vertical component of the CSS was also independent of the form or duration of the phonetic tests containing the sibilant word sounds used in this investigation. The CSS determined for 5 of the individual sibilant phonemes in the third exercise differed (p < 0.05) from that calculated for the three complete exercises. It was concluded that voicing sibilant phonemes, or word sounds, does cause the subject to adopt the CSS.(ABSTRACT TRUNCATED AT 250 WORDS)

46 citations


Journal ArticleDOI
TL;DR: Normal subjects' VOTs were significantly shorter at the fast rate of speech relative to the slow/normal rate, as expected, and the nonfluent aphasic patients produced voice and voiceless consonants with somewhat overlapping VOT distributions, indicating an impairment in temporal integration in these subjects.

36 citations


Journal ArticleDOI
TL;DR: The interaction between lexical acquisition and acquisition ofInitial voiceless stops was studied in two normally developing children by acoustically examining the token-by-token accuracy of initial voiceless stop targets in different lexical items.
Abstract: The interaction between lexical acquisition and acquisition of initial voiceless stops was studied in two normally developing children, aged 1;9 and 1;10, by acoustically examining the token-by-token accuracy of initial voiceless stop targets in different lexical items. Production accuracy was also examined as it related to the frequency of usage of different words, as well as the time when they entered the children's lexicons. Fewer than half of the words in the children's lexicons had tokens representing the emergence of accurate voiceless stop production prior to the session at which the voicing contrast was achieved. These words were primarily ‘old’ words that had been in the children's lexicons from the beginning of data collection, as opposed to ‘new’ words, first produced in later recording sessions. Findings are discussed in reference to the ‘lexical diffusion’ model of sound change and within the framework of nonlinear underspecification theory.

30 citations


Journal ArticleDOI
TL;DR: The findings are interpreted as supporting the hypothesis that speakers use their hearing to calibrate mechanisms of speech production by monitoring the relations between their articulations and their acoustic output.
Abstract: Voice‐onset time (VOT) and syllable duration were measured for the English plosives in /C■d/ (C=consonant) context spoken by four postlingually deafened recipients of multichannel (Ineraid) cochlear implants. Recordings were made of their speech before, and at intervals following, activation of the speech processors of their implants. Three patients reduced mean syllable duration following activation. Using measures of VOT and syllable duration from speakers with normal hearing [Volaitis and Miller, J. Acoust. Soc. Am. 92, 723–735 (1992)] and from the subjects of this study, VOT is shown to vary approximately linearly with syllable duration over the ranges produced here. Therefore, the VOT of each token was adjusted for the change in syllable duration of that token relative to the mean syllable duration in the first baseline session. This variable, labeled VOTc, was used to evaluate the effects on voicing of the speakers’ renewed access to the voicing contrast provided by their implants. Preimplant, all four speakers characteristically uttered voiced plosives with too‐short VOT, compared to the measures for hearing speakers. Voiceless plosive mean VOT was also abnormally short for two of the speakers, and close to normal for the remaining two. With some hearing restored, subjects made relatively few errors with respect to voicing when identifying plosives in listening tests, and three of the four speakers lengthened VOTc. The findings are interpreted as supporting the hypothesis that speakers use their hearing to calibrate mechanisms of speech production by monitoring the relations between their articulations and their acoustic output.

01 Sep 1993
TL;DR: The authors reconstructed the Kiranti or East Himalayish subgroup of Tibeto-Burman, with divergent evolutions depending on the place of articulation, showing a Germanic devoicing, with the voicing oppostion transphonologised into aspiration.
Abstract: Phonological reconstruction of the Kiranti or East Himalayish subgroup of Tibeto-Burman. Voiced and unvoiced series are reconstructed, with divergent evolutions depending on the place of articulation. The Eastern group shows a Germanic devoicing, with the voicing oppostion transphonologised into aspiration. Development of a partial glottal series (ɓ, ɗ) is proposed to explain the voicing "flip-flop" observed between western and southern subgroups in the dental and bilabial orders.

Journal ArticleDOI
TL;DR: The authors measured the articulatory kinematics involved in the transition from a consonant to a vowel and found that voiceless fricative transition is more dependent on vowel context than the vowel context.
Abstract: Formant transitions provide context‐dependent acoustic cues that can be interpreted in terms of the articulatory kinematics involved in moving from a consonant to a vowel Formant frequencies were measured at identified acoustic landmarks for eight English fricatives preceding front, back, and back‐rounded vowels Formant onset times designated the point when the energy increased most rapidly and evidence of the first formant was first observed Comparing the two‐dimensional representation of F2×F3 onset frequencies along the voicing dimension showed the voiceless fricatives to be more dependent on vowel context The onset frequencies for voiced fricatives reflect a more extreme supraglottal posture, while the voiceless fricative measures can be considered to be at a point closer to the vowel because voicing begins at a later time relative to the oral release gesture Formant structure in the noise before the release, to the extent that it is visible in the consonantal interval prior to voicing onset, can

Journal ArticleDOI
TL;DR: In this paper, the authors investigated durational differences in syllable-final nasals in Japanese, English, and Korean, and examined the mora hypothesis in Japanese (the mora nasal /n/) in Japanese.
Abstract: This study investigates durational differences in syllable-final nasals in Japanese, English, and Korean, and examines the mora hypothesis in Japanese. The phenomenon that syllable-final nasals are longer when followed by a voiced consonant than when followed by a voiceless consonant was observed in languages of different timing categories, i. e. Japanese (moratimed), Korean (syllable-timed), and English (stress-timed). However, syllable-final nasals in Japanese (the mora nasal /n/) are set apart with respect to the moraic status: syllable-final nasals (moraic) are clearly differentiated in duration from syllable-initial nasals (non-moraic).


Proceedings ArticleDOI
27 Apr 1993
TL;DR: A speech training system for deaf children which integrates acoustic and several types of instrumentally measured articulatory data: palatography, nasal vibration, airflow, and the presence/absence of voicing is described.
Abstract: The authors describe a speech training system for deaf children which integrates acoustic and several types of instrumentally measured articulatory data: palatography, nasal vibration, airflow, and the presence/absence of voicing. The system presents these in both a technical and a motivating game format. It was designed to be used both with a teacher's guidance and (within certain limits) by the children alone. The games are proving to be highly motivating to the children, and encourage them to experiment with their speech production. >

Dissertation
01 Dec 1993
TL;DR: In this paper, an experimental study of two major distinctive features: emphasis and voicing in the plosives of Yemeni spoken Arabic is presented, which investigates some of their acoustic, perceptual and arodynamic correlates and aims to find the language-specific aspects in as far as these phonetic phonemena are concerned.
Abstract: This is an experimental study of two major distinctive features: emphasis and voicing in the plosives of Yemeni Spoken Arabic. It investigates some of their acoustic, perceptual and arodynamic correlates and aims, at least in part, to find the language-specific aspects in as far as these phonetic phonemena are concerned. It falls into two related parts. Part One consists of two chapters. Chapter one gives a general background of YSA with special reference to the phonemic significance of emphasis and voicing in the plosives and their interaction with various contextual factors and positions. Phonological definitions of these features are given. Various theoretical approaches are also dealt with. The syllable structure and the stress patterns in both Modern Standard Arabic and Yemeni Spoken Arabic are presented. Chapter two reviews critically some of the hypotheses and interpretations of voicing mechanisms and the factors affecting their realizations in various languages. Some of the relevant aspects reviewed are voice onset time in various languages, formant transitions, closure durations, temporal relationship between consonants and vowels, categorical perception and the phoneme boundary, aerodynamic factors and their role in the production of plosives. The two features are also reviewed in relation to vocalic context, place of articulation, stress, gemination and phonetic position. Part Two consists of four chapters representing the main body of this study. Chapter three is an investigation of the acoustic characteristics of the voiced/voiceless and emphatic/nonemphatic categories in words embedded in a contextual frame sentence. Chapter four is a perceptual investigation of the above contrasts by means of synthetically generated speech using the Klatt Synthesizer. It examines the role played by VOT, the relative onset time between the release and the onset of voicing, in the accurate identification of the voicing cognates. Another experiment attempts to evaluate the role of the second formant particularly its onset frequency and steady state portion in the emphatic/nonemphatic distinction. The relationships between perception and production are described and the theory of 'categorical perception' in relation to our data is also discussed. Chapter five investigates aerodynamic patterns and aerodynamically derived estimates of articulation for the emphatic/nonemphatic and the voiced/voiceless consonants in two experiments. Since there are several variables involved in this investigation, the results in both experiments are subjected to analyses of variance to obtain the effects of the independent variables on the dependent ones. In chapter six the findings of the previous three chapters are summarized. Some implications for foreign language teaching and learning are also discussed. The study ends with a section on the limitations and suggestions for future research.

Journal ArticleDOI
TL;DR: The authors examined the contextual effects on the VOT of the post-consonantal vowel, the stress pattern, and the distance of the stress from the initial stop consonant.
Abstract: Although Lisker and Abramson (1967) found no effect of the following vowel on the VOT of a stop consonant, Port and Rotunno (1979) found VOT to have greater values for voiceless stops followed by tense than by lax vowels. The purpose of the present study was to obtain a complete database on the VOT characteristics of voiced and voiceless initial stop consonants in Greek, and to examine the contextual effects on the VOT of the post‐consonantal vowel, the stress pattern, and the distance of the stress from the initial stop consonant. The question here was whether the vowel effects found by Port and Rotunno for English would be seen in Greek, a language whose two stop categories have voicing lead and medium lag. Speakers read isolated disyllabic and trisyllabic words of four stress patterns. The utterance‐initial stops /p, t, k, b, d, g/ were followed by the five vowels of Greek, /ɑ , eh, i, o, u/. Results indicated that both voicing lead and voicing lag increased for stops followed by higher than by lower v...

Journal ArticleDOI
TL;DR: The analysis and testing have led to several conclusions concerning the control of the articulators for this speaker: production of obstruent consonants was a particular problem, whereas sonorant consonants were less of a problem (70% correct), and vowel errors were less prevalent.
Abstract: One practical aim of this research is to determine how best to use speech recognition techniques for augmenting the communication abilities of dysarthric speakers. As a first step toward this goal, the following kinds of analyses and tests have been performed on words produced by several dysarthric speakers: a closed‐set intelligibility test based on Kent et al. [J. Speech Hear. Disord. 54, 482–499 (1989)]; an open intelligibility test; critical listening and transcription; acoustic analysis of selected utterances; and an evaluation of the recognition of words by a commercial speech recognizer. The data from one speaker have been examined in detail. The analysis and testing have led to several conclusions concerning the control of the articulators for this speaker: production of obstruent consonants was a particular problem (only 30% of syllable‐initial obstruents were produced with no error), whereas sonorant consonants were less of a problem (70% correct). Of the obstruent errors, most were voicing errors, but place errors for alveolars (particularly fricatives) were also high, and these consonants were produced inconsistently, as inferred from acoustic analysis and from low scores from the recognizer for words with these consonants. In comparison, vowel errors were less prevalent. Implications for the use of a speech recognizer for augmenting this speaker’s communication abilities are discussed.

Patent
03 Sep 1993
TL;DR: In this paper, a phoneme model at the tail of a word and a diphone model in other cases are used to recognize a continuous speech generated by continuously voicing words.
Abstract: PURPOSE:To precisely recognize even a speech generated by continuously voicing words, without increasing the throughput at word borders when recognition units depending upon environment. CONSTITUTION:Diphones obtained by fractionizing a phoneme by a following phoneme and a phoneme which does not depend upon the following phoneme are used as the recognition units. A word dictionary 3 is so constituted as to use a phoneme model at the tail of a word and a diphone model in other cases. A recognition network 4 is generated by using the word dictionary, model parameters, and grammatical information to recognize a continuous speech. The parameters of the phoneme model are found by averaging the parameters of the diphone model.

Patent
13 Jul 1993
TL;DR: In this article, the authors present a speech answering control for a speech recognition and answering system with an ultterance speed measuring instrument 16 which measures the utterance speed of the input speech and an answering control part 17 which prepares expression forms differing in the number of characters for answer sentences having the same meaning and controls the expression forms of the answer sentence voiced by the voice answer output part 18 according to the voicing speed measured by the measuring device 16.
Abstract: PURPOSE:To provide the speech recognizing and answering device with high practicability which enables natural and smooth interaction between a human being and a machine and processes information through effective speech input. CONSTITUTION:The speech recognizing and answering device which recognizes an input speech by a speech recognition part 13 and voices an answer sentence for the recognition result from a speech answer output part 18 is equipped with an ultterance speed measuring instrument 16 which measures the utterance speed of the input speech and a speech answering control part 17 which prepares expression forms differing in the number of characters for answer sentences having the same meaning and controls the expression forms of the answer sentence voiced by the voice answer output part 18 according to the voicing speed measured by the voicing speed measuring instrument 16.

Journal ArticleDOI
TL;DR: In this paper, an algorithm was proposed to locate landmarks caused by closures and releases of obstruent consonants (sounds produced with a pressure buildup behind a constriction) flanked by sonorants.
Abstract: Locating landmarks, or acoustically important points, in an utterance is the first step in a proposed method for feature‐based speech recognition. The algorithm developed here is designed to locate landmarks caused by closures and releases of obstruent consonants (sounds produced with a pressure buildup behind a constriction) flanked by sonorants. Two characteristics of obstruents are: (1) voicing diminishes or stops completely at the onset and (2) noise is generated during the constricted interval (at the release in the case of stops and affricates). The algorithm thus monitors voicing changes by keeping track of low‐frequency signal energy and locates a landmark wherever a rapid change occurs. It also monitors higher frequencies for the presence of noise to aid in the detection of voiceless stop and affricate releases. With appropriate selection of time windows, smoothing intervals, and frequency bands, sonorant/obstruent boundaries for stops, fricatives, and affricates could be detected with only a few percent error. Semivowels and creaky voicing sometimes mistakenly cause a landmark to be detected, but more detailed analysis of the characteristics of these erroneous landmarks may overcome this problem. [Work supported by NSF.]

Journal ArticleDOI
TL;DR: Results showed distinct deficit profiles for each subject, consisting of patterns of defective stimulus control relations underlying persistent substitution between voiced and unvoiced consonants in the speech and writing of two children.
Abstract: This study attempted to analyze defective stimulus control relations underlying persistent substitution between voiced and unvoiced consonants in the speech and writing of two children. A series of 20 tests was administered repeatedly. Some tests consisted of matching-to-sample tasks, with dictated words, printed words, or pictures as samples. Comparison stimuli were arranged in pairs of printed words or pictures, such that the only difference in their corresponding spoken words was the voicing of one consonant phoneme. In other tests, a stimulus (dictated word, printed word, or picture) was presented, and the subject was required to emit an oral response (repeat the dictated word, read the printed word, or name the picture) or a written response (write to dictation, copy the word, or write a picture name). Other tests required the subjects to make a same/different distinction in pairs of dictated words that did or did not differ in the voicing of a single phoneme. Results showed distinct deficit profiles for each subject, consisting of patterns of defective stimulus control relations. The subjects were able, however, to distinguish between voiced and unvoiced sounds and to produce these sounds.

Journal ArticleDOI
TL;DR: This article applied recurrent neural networks to the processing of time-warped sequences, particularly, modelling how listeners distinguish between phonetic categories in the context of changing speech rate, and applied a more detailed speech representation to model the effects of both speaking rate and syllable structure.
Abstract: We apply recurrent neural networks to the processing of time-warped sequences, particularly, modelling how listeners distinguish between phonetic categories in the context of changing speech rate. In an earlier paper (AbuBakar & Chater, 1993), we modelled the effects of speaking rate on the perception of voicing contrasts specified by voice-onset-time (VOT) in syllable-initial stop consonants using a simple coding procedure. In the present investigation, we apply a more detailed speech representation to model the effects of both speaking rate and syllable structure on the syllableinitial distinction between a stop consonantlbl and a semivowel/wl cued by the duration of the formant transitions. In the first set of experiments, we constructed nine pairs of Iba/-/wa/ syllables varying in syllable duration and transition values. In another set of experiments, we compressed these syllables and added syllable-final transitions appropriate for a stop consonant /d/ to produce a second set of syllables (/bad-/wadl...

Patent
26 Mar 1993
TL;DR: In this paper, the authors simplify a selecting process by using various tables when unnaturalness is reduced by using data selected in diverse continuous speeches as to respective unit speeches for speech synthesis for synthesizing an optional word.
Abstract: PURPOSE:To simplify a selecting process by using various tables when unnaturalness is reduced by using data selected in diverse continuous speeches as to respective unit speeches for speech synthesis for synthesizing an optional word CONSTITUTION:Speech parameters obtained by analyzing a natural speech which is continuously voiced in advance, correspondence relation between unit speeches and the speech parameter, and a phoneme series in the voicing of the speech parameters are stored in a unit speech data table 4, which is referred to for each unit speech according to phoneme and rhythm information on an inputted character string to select the best unit speech in unit speech data according to a determined selection reference; and the speech parameter of the unit speech data selected by extraction 6 from the speech parameters 7 according to the information in the unit speech data table is used to synthesize a speech

Journal ArticleDOI
TL;DR: This article investigated the difficulty experienced by American-English listeners in identifying Hindi dental and retroflex stop consonants in different voicing conditions, and found that the difficulty was affected by which speaker produced the contrasts and, to a lesser extent, by the vowel context.
Abstract: Training listeners to perceive consonantal contrasts that do not occur in their native language has proved to be difficult. Cross‐language training studies usually produce about 10% improvement in performance. This improvement has not transferred to related material in different linguistic contexts. The present research had four aims: (1) to investigate the difficulty experienced by American‐English listeners in identifying Hindi dental and retroflex stop consonants in different voicing conditions, (2) to test a new, computer‐based, interactive training method, (3) to examine transfer of training to new voicing conditions, to a new vowel context, and to the voice of a new speaker, and (4) to test the hypothesis that increasing stimulus variability (in this case, training with one versus two voicing conditions) increases transfer of training. Subjects had differential difficulty identifying dental versus retroflex consonants produced in different voicing conditions. Further, this relative difficulty was affected by which speaker produced the contrasts and, to a lesser extent, by the vowel context. The computer‐based training improved subjects’ consonant identification. However, this improvement showed little transfer to new stimuli. Finally, increasing stimulus variability during training did not affect transfer of training. [Work supported by NIDCD.]


Journal Article
TL;DR: The MLP-based pattern element aid gave significantly better performance in the reception of consonantal voicing contrasts from speech in pink noise than that achieved with conventional amplification and consequently, it also gave better overall performance in audio-visual consonant identification.
Abstract: Two new developments in speech pattern processing hearing aids will be described. The first development is the use of compound speech pattern coding. Speech information which is invisible to the lipreader was encoded in terms of three acoustic speech factors; the voice fundamental frequency pattern, coded as a sinusoid, the presence of aperiodic excitation, coded as a low-frequency noise, and the wide-band amplitude envelope, coded by amplitude modulation of the sinusoid and noise signals. Each element of the compound stimulus was individually matched in frequency and intensity to the listener's receptive range. Audio-visual speech receptive assessments in five profoundly hearing-impaired listeners were performed to examine the contributions of adding voiceless and amplitude information to the voice fundamental frequency pattern, and to compare these codings to amplified speech. In both consonant recognition and connected discourse tracking (CDT), all five subjects showed an advantage from the addition of amplitude information to the fundamental frequency pattern. In consonant identification, all five subjects showed further improvements in performance when voiceless speech excitation was additionally encoded together with amplitude information, but this effect was not found in CDT. The addition of voiceless information to voice fundamental frequency information did not improve performance in the absence of amplitude information. Three of the subjects performed significantly better in at least one of the compound speech pattern conditions than with amplified speech, while the other two performed similarly with amplified speech and the best compound speech pattern condition. The three speech pattern elements encoded here may represent a near-optimal basis for an acoustic aid to lipreading for this group of listeners. The second development is the use of a trained multi-layer-perceptron (MLP) pattern classification algorithm as the basis for a robust real-time voice fundamental frequency extractor. This algorithm runs on a low-power digital signal processor which can be incorporated in a wearable hearing aid. Aided lipreading for speech in noise was assessed in the same five profoundly hearing-impaired listeners to compare the benefits of conventional hearing aids with those of an aid which provided MLP-based fundamental frequency information together with speech+noise amplitude information. The MLP-based pattern element aid gave significantly better performance in the reception of consonantal voicing contrasts from speech in pink noise than that achieved with conventional amplification and consequently, it also gave better overall performance in audio-visual consonant identification.(ABSTRACT TRUNCATED AT 400 WORDS)