scispace - formally typeset
X

Xu Shao

Researcher at Nuance Communications

Publications -  23
Citations -  1404

Xu Shao is an academic researcher from Nuance Communications. The author has contributed to research in topics: Mel-frequency cepstrum & Speech processing. The author has an hindex of 9, co-authored 23 publications receiving 1225 citations. Previous affiliations of Xu Shao include University of East Anglia & University of Sheffield.

Papers
More filters
Journal ArticleDOI

An audio-visual corpus for speech perception and automatic speech recognition

TL;DR: An audio-visual corpus that consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers to support the use of common material in speech perception and automatic speech recognition studies.
Journal ArticleDOI

Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction

TL;DR: Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions.
Proceedings Article

Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model

Ben Milner, +1 more
TL;DR: This work presents a method of reconstructing a speech signal from a stream of MFCC vectors using a source-filter model of speech production, and listening tests reveal that the reconstructed speech is intelligible and of similar quality to a system based on LPC analysis of the original speech.
Journal ArticleDOI

Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment

TL;DR: The paper presents a novel solution that combines both audio and visual information to estimate acoustic SNR and relates the use of visual information in the current system to its role in recent simultaneous speaker intelligibility studies, where, as well as providing phonetic content, it triggers 'informational masking release', helping the listener to attend selectively to the target speech stream.
Journal ArticleDOI

Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end

TL;DR: Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.