scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants

03 Aug 2001-Journal of the Acoustical Society of America (ACOUSTICAL SOC AMER AMER INST PHYSICS)-Vol. 110, Iss: 2, pp 1150-1163
TL;DR: The results quantify the effect of number of spectral channels on speech recognition in noise and demonstrate that most CI subjects are not able to fully utilize the spectral information provided by the number of electrodes used in their implant.
Abstract: Speech recognition was measured as a function of spectral resolution (number of spectral channels) and speech-to-noise ratio in normal-hearing (NH) and cochlear-implant(CI) listeners. Vowel, consonant, word, and sentence recognition were measured in five normal-hearing listeners, ten listeners with the Nucleus-22 cochlear implant, and nine listeners with the Advanced Bionics Clarion cochlear implant. Recognition was measured as a function of the number of spectral channels (noise bands or electrodes) at signal-to-noise ratios of +15, +10, +5, 0 dB, and in quiet. Performance with three different speech processing strategies (SPEAK, CIS, and SAS) was similar across all conditions, and improved as the number of electrodes increased (up to seven or eight) for all conditions. For all noise levels, vowel and consonant recognition with the SPEAK speech processor did not improve with more than seven electrodes, while for normal-hearing listeners, performance continued to increase up to at least 20 channels. Speech recognition on more difficult speech materials (word and sentence recognition) showed a marginally significant increase in Nucleus-22 listeners from seven to ten electrodes. The average implant score on all processing strategies was poorer than scores of NH listeners with similar processing. However, the best CI scores were similar to the normal-hearing scores for that condition (up to seven channels). CI listeners with the highest performance level increased in performance as the number of electrodes increased up to seven, while CI listeners with low levels of speech recognition did not increase in performance as the number of electrodes was increased beyond four. These results quantify the effect of number of spectral channels on speech recognition in noise and demonstrate that most CI subjects are not able to fully utilize the spectral information provided by the number of electrodes used in their implant.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The aims of this paper are to provide a brief history of cochlear implants, present a status report on the current state of implant engineering and the levels of speech understanding enabled by that engineering, describe limitations of current signal processing strategies, and suggest new directions for research.

646 citations


Cites background from "Speech recognition in noise as a fu..."

  • ...Patients with low speech reception scores generally do not have more than four effective channels for any test, whereas patients with high scores may have as many as eight or slightly more channels depending on the test (e.g., Friesen et al., 2001; Dorman and Spahr, 2006)....

    [...]

  • ...sented in competition with noise or a multi-talker babble (Friesen et al., 2001; Shannon et al., 2004)....

    [...]

  • ...Other investigators have found that even more channels are needed for asymptotic performance, especially for difficult tests such as identification of vowels or recognition of speech pre- sented in competition with noise or a multi-talker babble (Friesen et al., 2001; Shannon et al., 2004)....

    [...]

  • ...For example, Friesen et al. (2001) found that identification of vowels for listeners with normal hearing continued to improve with the addition of channels in the acoustic simulations up to the tested limit of 20 channels, for vowels presented in quiet and at progressively worse S/Ns out to and…...

    [...]

  • ...Present evidence suggests, however, that no more than 4–8 independent sites are available in a speech-processor context and using present electrode designs, even for arrays with as many as 22 electrodes (Lawson et al., 1996; Fishman et al., 1997; Wilson, 1997; Kiefer et al., 2000; Friesen et al., 2001; Garnham et al., 2002)....

    [...]

Journal ArticleDOI
TL;DR: Surgical strategies used for hearing preservation with a short hybrid cochlear implant, and the benefits of preserved residual low‐frequency hearing, improved word understanding in noise, and music appreciation are described.
Abstract: Objectives/Hypothesis: This study documents the importance of preserving residual low-frequency acoustic hearing as those with more residual hearing are selected for cochlear implantation. Surgical strategies used for hearing preservation with a short hybrid cochlear implant are outlined. The benefits of preserved residual low-frequency hearing, improved word understanding in noise, and music appreciation are described. Study Design: Multicenter, prospective, single-subject design. Methods: Records were reviewed of 21 individuals participating in an Food and Drug Administration (FDA) feasibility clinical trial who have received an Iowa/Nucleus 10 mm electrode. A second group of subjects receiving implants at the University of Iowa that have used the 10 mm device between 2 years and 6 months were also reviewed. Outcome measures included standardized tests of monosyllabic word understanding, spondees in noise, and common melody recognition. Results: Lowfrequency hearing was maintained in all individuals immediately postoperative. One subject lost hearing at 2.5 months postoperative after a viral infection. The group has averaged a loss of 9 dB low-frequency acoustic hearing between 125 and 1,000 Hz. Monosyllabic word understanding scores at 6 months for a group being followed for an FDA clinical trial using the implant plus hearing aids was 69% correct. For the long-term group receiving implants at Iowa, monosyllabic word understanding in those who have used the device between 6 months and 2 years is 79%. Other important findings include improved recognition of speech in noise (9 dB improvement) as compared with standard cochlear implant recipients who were matched for speech recognition in quiet and near normal recognition of common melodies. Conclusion: The surgical strategies outlined have been successful in preservation of low-frequency hearing in 96% of individuals. Combined electrical and acoustical speech processing has enabled this group of volunteers to gain improved word understanding as compared with their preoperative hearing with bilateral hearing aids and a group of individuals receiving a standard cochlear implant with similar experience with their device. The improvement of speech in noise and melody recognition is attributed to the ability to distinguish fine pitch differences as the result of preserved residual low-frequency acoustic hearing. Preservation of low-frequency acoustic hearing is important for improving speech in noise and music appreciation for the hearing impaired, both of which are important in real-life situations. Key Words: Hearing Preservation, cochlear implant, hybrid cochlear implant, hearing in noise. Laryngoscope, 115:796–802, 2005

461 citations


Cites background from "Speech recognition in noise as a fu..."

  • ...Even the most successful implant users only realize perhaps 6 to 8 channels of distinct “place-frequency” information across the entire spectral range, and this deficit in spectral resolution has a direct negative consequence on the implant user’s ability to understand speech in background noise.(4) Thus, although traditional CIs are quite successful for many, if not most, patients in restoring excellent speech recognition in quiet, even the most successful implant patients suffer from significant problems understanding speech in background noise....

    [...]

Book
19 Sep 2014
TL;DR: In this paper, the authors review behavioural and neuroimaging studies of face-voice integration in the context of person perception and find evidence for interference between facial and vocal information during affect recognition or identity processing.
Abstract: Integration of information from face and voice plays a central role in our social interactions. It has been mostly studied in the context of audiovisual speech perception: integration of affective or identity information has received comparatively little scientific attention. Here, we review behavioural and neuroimaging studies of face-voice integration in the context of person perception. Clear evidence for interference between facial and vocal information has been observed during affect recognition or identity processing. Integration effects on cerebral activity are apparent both at the level of heteromodal cortical regions of convergence, particularly bilateral posterior superior temporal sulcus (pSTS), and at 'unimodal' levels of sensory processing. Whether the latter reflects feedback mechanisms or direct crosstalk between auditory and visual cortices is as yet unclear.

408 citations

Journal ArticleDOI
TL;DR: The results suggest that using steady-state noise to test speech intelligibility may underestimate the difficulties experienced by cochlear-implant users in fluctuating acoustic backgrounds.
Abstract: This study investigated the effects of simulated cochlear-implant processing on speech reception in a variety of complex masking situations. Speech recognition was measured as a function of target-to-masker ratio, processing condition (4, 8, 24 channels, and unprocessed) and masker type (speech-shaped noise, amplitude-modulated speech-shaped noise, single male talker, and single female talker). The results showed that simulated implant processing was more detrimental to speech reception in fluctuating interference than in steady-state noise. Performance in the 24-channel processing condition was substantially poorer than in the unprocessed condition, despite the comparable representation of the spectral envelope. The detrimental effects of simulated implant processing in fluctuating maskers, even with large numbers of channels, may be due to the reduction in the pitch cues used in sound source segregation, which are normally carried by the peripherally resolved low-frequency harmonics and the temporal fine structure. The results suggest that using steady-state noise to test speech intelligibility may underestimate the difficulties experienced by cochlear-implant users in fluctuating acoustic backgrounds.

386 citations

Journal ArticleDOI
TL;DR: The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility and evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest.
Abstract: We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants.

375 citations


Cites background from "Speech recognition in noise as a fu..."

  • ...However, in noisy environments, additional spectral information results in significant speech hearing improvement [20,25]....

    [...]

  • ...have shown that intelligibility increased with additional spectral channels [25]....

    [...]

References
More filters
Journal ArticleDOI
13 Oct 1995-Science
TL;DR: Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information; the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.
Abstract: Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.

2,865 citations

Journal ArticleDOI
TL;DR: The mean-squared level of each digitally recorded sentence was adjusted to equate intelligibility when presented in spectrally matched noise to normal-hearing listeners, and statistical reliability and efficiency suit it to practical applications in which measures of speech intelligibility are required.
Abstract: A large set of sentence materials, chosen for their uniformity in length and representation of natural speech, has been developed for the measurement of sentence speech reception thresholds (sSRTs). The mean‐squared level of each digitally recorded sentence was adjusted to equate intelligibility when presented in spectrally matched noise to normal‐hearing listeners. These materials were cast into 25 phonemically balanced lists of ten sentences for adaptive measurement of sentence sSRTs. The 95% confidence interval for these measurements is ±2.98 dB for sSRTs in quiet and ±2.41 dB for sSRTs in noise, as defined by the variability of repeated measures with different lists. Average sSRTs in quiet were 23.91 dB(A). Average sSRTs in 72 dB(A) noise were 69.08 dB(A), or −2.92 dB signal/noise ratio. Low‐pass filtering increased sSRTs slightly in quiet and noise as the 4‐ and 8‐kHz octave bands were eliminated. Much larger increases in SRT occurred when the 2‐kHz octave band was eliminated, and bandwidth dropped below 2.5 kHz. Reliability was not degraded substantially until bandwidth dropped below 2.5 kHz. The statistical reliability and efficiency of the test suit it to practical applications in which measures of speech intelligibility are required.

1,909 citations

Journal ArticleDOI
TL;DR: Analysis of the formant data shows numerous differences between the present data and those of PB, both in terms of average frequencies of F1 and F2, and the degree of overlap among adjacent vowels.
Abstract: This study was designed as a replication and extension of the classic study of vowel acoustics by Peterson and Barney (PB) [J. Acoust. Soc. Am. 24, 175–184 (1952)]. Recordings were made of 50 men, 50 women, and 50 children producing the vowels /i, i, eh, ae, hooked backward eh, inverted vee), a, open oh, u, u/ in h–V–d syllables. Formant contours for F1–F4 were measured from LPC spectra using a custom interactive editing tool. For comparison with the PB data, formant patterns were sampled at a time that was judged by visual inspection to be maximally steady. Preliminary analysis shows numerous differences between the present data and those of PB, both in terms of average formant frequencies for vowels, and the degree of overlap among adjacent vowels. As with the original study, listening tests showed that the signals were nearly always identified as the vowel intended by the talker.

1,891 citations

Journal ArticleDOI
TL;DR: In this paper, an articulatory analysis of 16 English consonants was performed over voice communication systems with frequency distortion and with random masking noise. The listeners were forced to guess at every sound and a count was made of all the different errors that resulted when one sound was confused with another.
Abstract: Sixteen English consonants were spoken over voice communication systems with frequency distortion and with random masking noise. The listeners were forced to guess at every sound and a count was made of all the different errors that resulted when one sound was confused with another. With noise or low‐pass filtering the confusions fall into consistent patterns, but with high‐pass filtering the errors are scattered quite randomly. An articulatory analysis of these 16 consonants provides a system of five articulatory features or “dimensions” that serve to characterize and distinguish the different phonemes: voicing, nasality, affrication, duration, and place of articulation. The data indicate that voicing and nasality are little affected and that place is severely affected by low‐pass and noisy systems. The indications are that the perception of any one of these five features is relatively independent of the perception of the others, so that it is as if five separate, simple channels were involved rather than a single complex channel.

1,842 citations

Journal ArticleDOI
TL;DR: It is shown that the newer extended data on human cadaver ears and from living animal preparations are quite well fit by the same basic function, which increases the function's value in plotting auditory data and in modeling concerned with speech and other bioacoustic signals.
Abstract: Accurate cochlear frequency-position functions based on physiological data would facilitate the interpretation of physiological and psychoacoustic data within and across species. Such functions might aid in developing cochlear models, and cochlear coordinates could provide potentially useful spectral transforms of speech and other acoustic signals. In 1961, an almost-exponential function was developed (Greenwood, 1961b, 1974) by integrating an exponential function fitted to a subset of frequency resolution-integration estimates (critical bandwidths). The resulting frequency-position function was found to fit cochlear observations on human cadaver ears quite well and, with changes of constants, those on elephant, cow, guinea pig, rat, mouse, and chicken (Bekesy, 1960), as well as in vivo (behavioral-anatomical) data on cats (Schucknecht, 1953). Since 1961, new mechanical and other physiological data have appeared on the human, cat, guinea pig, chinchilla, monkey, and gerbil. It is shown here that the newer extended data on human cadaver ears and from living animal preparations are quite well fit by the same basic function. The function essentially requires only empirical adjustment of a single parameter to set an upper frequency limit, while a "slope" parameter can be left constant if cochlear partition length is normalized to 1 or scaled if distance is specified in physical units. Constancy of slope and form in dead and living ears and across species increases the probability that the function fitting human cadaver data may apply as well to the living human ear. This prospect increases the function's value in plotting auditory data and in modeling concerned with speech and other bioacoustic signals, since it fits the available physiological data well and, consequently (if those data are correct), remains independent of, and an appropriate means to examine, psychoacoustic data and assumptions.

1,789 citations