A comparison between native and non-native speech for automatic speech recognition

doi:10.1121/1.5101679

Journal ArticleDOI

A comparison between native and non-native speech for automatic speech recognition

Seongjin Park, +1 more

- 23 Apr 2019 -

Journal of the Acoustical Society of Ame...

- Vol. 145, Iss: 3, pp 1827-1827

TLDR

Preliminary results suggest that non-native speakers of English fail to produce flaps and reduced vowels, insert or delete segments, engage in more self-correction, and place pauses in different locations from native speakers.

Abstract:

This study investigates differences in sentence and story production between native and non-native speakers of English for use with a system of Automatic Speech Recognition (ASR). Previous studies have shown that production errors by non-native speakers of English include misproduced segments (Flege, 1995), longer pause duration (Anderson-Hsieh and Venkatagiri, 1994), abnormal pause location within clauses (Kang, 2010), and non-reduction of function words (Jang, 2009). The present study uses phonemically balanced sentences from TIMIT (Garofolo et al., 1993) and a story to provide an additional comparison of the differences in production by native and non-native speakers of English. Consistent with previous research, preliminary results suggest that non-native speakers of English fail to produce flaps and reduced vowels, insert or delete segments, engage in more self-correction, and place pauses in different locations from native speakers. Non-native English speakers furthermore produce different patterns of intonation from native speakers and produce errors indicative of transfer from their L1 phonology, such as coda deletion and vowel epenthesis. Native speaker productions also contained errors, the majority of which were content-related. These results indicate that difficulties posed by English ASR systems in recognizing non-native speech are due largely to the heterogeneity of non-native production.

A comparison between native and non-native speech for automatic speech recognition

Citations

Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech.

Self-supervised end-to-end ASR for low resource L2 Swedish

Automatic Speech Recognition and Pronunciation Error Detection of Dutch Non-native Speech: cumulating speech resources in a pluricentric language

Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning

Reconnaissance automatique de la parole : génération des prononciations non natives pour l'enrichissement du lexique

References

Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech.

Self-supervised end-to-end ASR for low resource L2 Swedish

Reconnaissance automatique de la parole : génération des prononciations non natives pour l'enrichissement du lexique

Related Papers (5)

Non-native speaker pause patterns closely correspond to those of native speakers at different speech rates.

Speech rate and pauses in non-native Finnish

Acoustic features of English sentences produced by native and non-native speakers

Accuracy and variability in vowel targets produced by native and non‐native speakers of English

Acoustic variability in the production of english vowels by native and non- native speakers

Trending Questions (1)