scispace - formally typeset
Proceedings ArticleDOI

Speech synthesis from short-time Fourier transform magnitude and its application to speech processing

Reads0
Chats0
TLDR
For the applications of speech synthesis from speech model parameters, time-scale modification of clean speech, speech enhancement by spectral subtraction, and helium speech enhancement, significant improvement is not gained by using the LSEE-MSTFTM algorithm.
Abstract
In this paper, speech synthesis directly from the processed Short-Time Fourier Transform Magnitude (STFTM) using the LSEE-MSTFTM algorithm [6,7] is compared to more conventional algorithms for several speech processing applications. For the applications considered, the most improvement occurs for time-scale modification of multiple speaker speech and noisy speech since these input signals are not well modeled by the analysis/synthesis system used for comparison. However, for the applications of speech synthesis from speech model parameters, time-scale modification of clean speech, speech enhancement by spectral subtraction, and helium speech enhancement, significant improvement is not gained by using the LSEE-MSTFTM algorithm. Significantly better results are not obtained since a good STFT phase estimate is available and employed in the conventional approaches to these applications.

read more

Citations
More filters
Journal ArticleDOI

STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement

TL;DR: It is shown that, when the noisy phase is enhanced using the proposed phase reconstruction, instrumental measures predict an increase of speech quality over a range of signal to noise ratios, even without explicit amplitude enhancement.
Journal ArticleDOI

Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra

TL;DR: An algorithm for estimating signals from short-time magnitude spectra is introduced offering a significant improvement in quality and efficiency over current methods, and is applied to audio time-scale and pitch modification and compared to classical algorithms for these tasks on a variety of signal types.
Proceedings ArticleDOI

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

TL;DR: This work extends state-of-the-art attention-based automatic speech recognition (ASR) systems with synthetic audio generated by a TTS system trained only on the ASR corpora itself, closing the gap to a comparable oracle experiment by more than 50%.
Patent

Harmonic adaptive speech coding method and system

TL;DR: In this article, a method and system for encoding and decoding of speech signals at a low bit rate is presented, where continuous input speech is divided into voiced and unvoiced time segments of a predetermined length.
Proceedings ArticleDOI

Single Pass Spectrogram Inversion

TL;DR: The Single-Pass Spectrogram Inversion (SPSI) algorithm is similar to the synthesis step in phaselocked vocoders, but with phase rates at spectral peaks determined solely from the magnitude spectra using quadratic interpolation.
References
More filters
Journal ArticleDOI

Signal estimation from modified short-time Fourier transform

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.
Journal ArticleDOI

Short term spectral analysis, synthesis, and modification by discrete Fourier transform

TL;DR: In this article, a theory of short term spectral analysis, synthesis, and modification is presented with an attempt at pointing out certain practical and theoretical questions, which are useful in designing filter banks when the filter bank outputs are to be used for synthesis after multiplicative modifications are made to the spectrum.
Proceedings ArticleDOI

Signal estimation from modified short-time Fourier transform

TL;DR: An algorithm to estimate a signal from its modified short-time Fourier transform (STFT) by minimizing the mean squared error between the STFT of the estimated signal and the modified STFT magnitude is presented.
Journal ArticleDOI

Multimicrophone signal‐processing technique to remove room reverberation from speech signals

TL;DR: A multimicrophone digital processing scheme for removing much of the degrading distortion in acoustic recordings produced in untreated rooms by dividing microphone signals into frequency bands whose corresponding outputs are cophased and added.
Journal ArticleDOI

Time-scale modification of speech based on short-time Fourier analysis

TL;DR: In this paper, the authors developed the theoretical basis for time-scale modification of speech based on short-time Fourier analysis and developed a high quality system for changing the apparent rate of articulation of recorded speech, while at the same time preserving such qualities as naturalness, intelligibility, and speaker-dependent features.