scispace - formally typeset
Search or ask a question
Author

Meng Yuan

Other affiliations: East China Normal University
Bio: Meng Yuan is an academic researcher from The Chinese University of Hong Kong. The author has contributed to research in topics: Tone (musical instrument) & Speech perception. The author has an hindex of 5, co-authored 12 publications receiving 89 citations. Previous affiliations of Meng Yuan include East China Normal University.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents an investigation on the implementation of automatic speech recognition (ASR) using fixed-point digital signal processors (DSP) and quantitatively reveals the trade-off relationship between resources requirement and recognition performance.
Abstract: This paper provides a thorough description of the implementation of automatic speech recognition (ASR) algorithms on a fixed-point digital signal processor (DSP). It is intended to serve as a useful self-contained reference for DSP engineers to follow when developing similar applications. The work is based on a detailed analysis of hidden Markov model (HMM) based ASR algorithms. The computationally critical steps are clearly identified, and for each of them, different ways of optimization for real-time computation are suggested and evaluated. The trade-off among computational efficiency, memory requirements and recognition performance is illustrated quantitatively via three example systems, one for the recognition of isolated Chinese words and the other two for the recognition of English and Chinese digit strings, respectively. The paper also discusses about other techniques that can be implemented to further improve the recognition performance in real-world applications.

25 citations

Journal ArticleDOI
TL;DR: The use of periodicity-enhanced TEPCs led to consistent improvement of tone identification accuracy, and was more significant at low signal-to-noise ratios, and more noticeable for female than for male voices.
Abstract: This study investigated the contributions of temporal periodicity cues and the effectiveness of enhancing these cues for Cantonese tone recognition in noise. A multichannel noise-excited vocoder was used to simulate speech processing in cochlear implants. Ten normal-hearing listeners were tested. Temporal envelope and periodicity cues (TEPCs) below 500 Hz were extracted from four frequency bands: 60-500, 500-1000, 1000-2000, and 2000-4000 Hz. The test stimuli were obtained by combining TEPC-modulated noise signals from individual bands. For periodicity enhancement, temporal fluctuations in the range 20-500 Hz were replaced by a sinusoid with frequency equal to the fundamental frequency of original speech. Tone identification experiments were carried out using disyllabic word carriers. Results showed that TEPCs from the two high-frequency bands were more important for tone identification than TEPCs from the low-frequency bands. The use of periodicity-enhanced TEPCs led to consistent improvement of tone identification accuracy. The improvement was more significant at low signal-to-noise ratios, and more noticeable for female than for male voices. Analysis of error distributions showed that the enhancement method reduced tone identification errors and did not show any negative effect on the recognition of segmental structures.

22 citations

Journal ArticleDOI
TL;DR: MAPPID-N was developed to assess the speech-recognition abilities in noise of Mandarin-speaking children on disyllabic words, and lexical tones in monosyllabIC words, in a picture-identification test format, suggesting the improvement with age in the use of intensity and timing cues differences between the two ears.
Abstract: MAPPID-N was developed to assess the speech-recognition abilities in noise of Mandarin-speaking children on disyllabic words, and lexical tones in monosyllabic words, in a picture-identification test format. Twenty-six normal-hearing children aged four to nine years listened repeatedly to the test materials where noise was spatially mixed with or separated from speech, in different signal-to-noise (SNR) ratios, to obtain performance-SNR functions and SNR for 50% correct scores (SNR-50%). SNR-50% improved with age only when noise was spatially separated from speech but not when noise was mixed with speech, suggesting the improvement with age in the use of intensity and timing cues differences between the two ears. The homogeneity of the test items was improved by adjusting the intensity levels of individual test items to align their SNR-50% to the mean SNR-50% level. Copyright © 2009 John Wiley & Sons, Ltd.

10 citations

Journal ArticleDOI
TL;DR: The findings suggested that high frequency bands are carrying TEPC which are important for lexical-tone identification, which indicates the potential on improving speech recognition in tonal languages by manipulating TEPC via new signal processing algorithms in hearing prosthesis.
Abstract: Objectives: Temporal envelope and periodicity components (TEPC) in the speech signal have potentials to offer important cues for speech recognition especially in tonal languages. The aims of this study are: (i) to investigate the degree of contributions of TEPC to lexical tone identification in Cantonese; and (ii) to investigate whether or not the contributions vary among different frequency bands. The results of these investigations would reveal if there are any frequency-specific TEPC that are important for lexical tone identification. Design: TEPC of monosyllable words carrying different lexical tones, were extracted by the method of full-wave rectification and low-pass filtering. They were used to modulate a speech spectrum noise to create the test stimuli. Thus the stimuli contain only temporal envelope and periodicity components but no temporal fine structures of the original speech signal. Multiple sets of stimuli were created with different combinations of TEPC modulated frequency bands, Eighteen adult subjects with normal hearing participated in the study. Results: Lexical tone identification was the best when only the TEPC from the two high frequency bands (1-2 kHz and 2-4 kHz) of the original signal were provided, but the worst when only the TEPC from the two low frequency bands (60-500 Hz and 500-1000 Hz) were provided. The findings suggested that high frequency bands are carrying TEPC which are important for lexical-tone identification. Lexical tone identification performance was better for the male stimuli than the female ones. Conclusions: The results indicate the potential on improving speech recognition in tonal languages by manipulating TEPC via new signal processing algorithms in hearing prosthesis.

9 citations


Cited by
More filters
Book
01 Jan 1968

1,644 citations

Journal ArticleDOI
TL;DR: This article presents a complete mixed-signal system-on-chip, capable of directly interfacing to an analog microphone and performing keyword spotting (KWS) and speaker verification (SV), without any need for further external accesses.
Abstract: The use of speech-triggered wake-up interfaces has grown significantly in the last few years for use in ubiquitous and mobile devices. Since these interfaces must always be active, power consumption is one of their primary design metrics. This article presents a complete mixed-signal system-on-chip, capable of directly interfacing to an analog microphone and performing keyword spotting (KWS) and speaker verification (SV), without any need for further external accesses. Through the use of: 1) an integrated single-chip digital-friendly design; b) hardware-aware algorithmic optimization; and c) memory- and power-optimized accelerators, ultra-low power is achieved while maintaining high accuracy for speech recognition tasks. The 65-nm implementation achieves 18.3- $\mu \text{W}$ worst case power consumption or 10.6- $\mu \text{W}$ power for typical real-time scenarios, $10\times $ below state of the art (SoA).

63 citations

Journal ArticleDOI
TL;DR: It is suggested that pediatric cochlear implant recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients, and it is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding.
Abstract: Adult cochlear implant (CI) recipients demonstrate a reliable relationship between spectral modulation detection and speech understanding. Prior studies documenting this relationship have focused on postlingually deafened adult CI recipients-leaving an open question regarding the relationship between spectral resolution and speech understanding for adults and children with prelingual onset of deafness. Here, we report CI performance on the measures of speech recognition and spectral modulation detection for 578 CI recipients including 477 postlingual adults, 65 prelingual adults, and 36 prelingual pediatric CI users. The results demonstrated a significant correlation between spectral modulation detection and various measures of speech understanding for 542 adult CI recipients. For 36 pediatric CI recipients, however, there was no significant correlation between spectral modulation detection and speech understanding in quiet or in noise nor was spectral modulation detection significantly correlated with listener age or age at implantation. These findings suggest that pediatric CI recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients. It is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding. Further investigation is warranted to investigate the relationship between spectral and temporal resolution and speech recognition to describe the underlying mechanisms driving peripheral auditory processing in pediatric CI users.

48 citations

Journal ArticleDOI
TL;DR: It is demonstrated that although most single-channel NR algorithms could effectively improve speech recognition in noise for Mandarin-speaking cochlear implant listeners, these algorithms perform differently in various environmental noises, and it would be beneficial for the CI sound processor to integrate NR methods tailored to individual types of noises for the best cost and benefit tradeoff.
Abstract: Objectives:The purpose of this study was to (1) assess the contributions of single-channel noise-reduction (NR) algorithms for improving speech intelligibility for Mandarin-speaking cochlear implant (CI) listeners and (2) examine whether different algorithms perform differently in various environmen

43 citations