X
Xu Shao
Researcher at Nuance Communications
Publications - 23
Citations - 1404
Xu Shao is an academic researcher from Nuance Communications. The author has contributed to research in topics: Mel-frequency cepstrum & Speech processing. The author has an hindex of 9, co-authored 23 publications receiving 1225 citations. Previous affiliations of Xu Shao include University of East Anglia & University of Sheffield.
Papers
More filters
Journal ArticleDOI
An audio-visual corpus for speech perception and automatic speech recognition
TL;DR: An audio-visual corpus that consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers to support the use of common material in speech perception and automatic speech recognition studies.
Journal ArticleDOI
Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction
Ben Milner,Xu Shao +1 more
TL;DR: Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions.
Proceedings Article
Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model
Ben Milner,Xu Shao +1 more
TL;DR: This work presents a method of reconstructing a speech signal from a stream of MFCC vectors using a source-filter model of speech production, and listening tests reveal that the reconstructed speech is intelligible and of similar quality to a system based on LPC analysis of the original speech.
Journal ArticleDOI
Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment
Xu Shao,Jon Barker +1 more
TL;DR: The paper presents a novel solution that combines both audio and visual information to estimate acoustic SNR and relates the use of visual information in the current system to its role in recent simultaneous speaker intelligibility studies, where, as well as providing phonetic content, it triggers 'informational masking release', helping the listener to attend selectively to the target speech stream.
Journal ArticleDOI
Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end
Ben Milner,Xu Shao +1 more
TL;DR: Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.