scispace - formally typeset
Open AccessProceedings ArticleDOI

High resolution signal reconstruction

T. Kristjansson, +1 more
- pp 291-296
Reads0
Chats0
TLDR
A framework for speech enhancement and robust speech recognition that exploits the harmonic structure of speech and achieves substantial gains in signal-to-noise ratio (SNR) of enhanced speech as well as considerable gains in accuracy of automatic speech recognition in very noisy conditions.
Abstract
We present a framework for speech enhancement and robust speech recognition that exploits the harmonic structure of speech. We achieve substantial gains in signal-to-noise ratio (SNR) of enhanced speech as well as considerable gains in accuracy of automatic speech recognition in very noisy conditions. The method exploits the harmonic structure of speech by employing a high frequency resolution speech model in the log-spectrum domain and reconstructs the signal from the estimated posteriors of the clean signal and the phases from the original noisy signal. We achieve a gain in SNR of 8.38 dB for enhancement of speech at 0 dB. We also present recognition results on the Aurora 2 data-set. At 0 dB SNR, we achieve a reduction of relative word error rate of 43.75% over the baseline, and 15.90% over the equivalent low-resolution algorithm.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Single microphone source separation using high resolution signal reconstruction

TL;DR: A method for separating two speakers from a single microphone channel that exploits the fine structure of male and female speech and relies on a strong high frequency resolution model for the source signals.

Model-Based Scene Analysis

TL;DR: This chapter examines the general idea of using models in source separation including looking at what form these models can take and how they can be acquired, and describes examples of several systems which can be described within this chapter.
Journal ArticleDOI

Noisy Speech Enhancement Using Harmonic-Noise Model and Codebook-Based Post-Processing

TL;DR: Evaluations of the performance gain obtained from the proposed post-processing speech restoration module are presented and compared to standard speech enhancement systems which show substantial improvement gains in perceptual quality.
Journal ArticleDOI

Speech Enhancement Using Gaussian Scale Mixture Models

TL;DR: The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise and effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.
Journal ArticleDOI

Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation

TL;DR: A new approximate Bayesian estimator for enhancing a noisy speech signal that offers improved signal-to-noise ratio, lower word recognition error rate, and less spectral distortion.
References
More filters
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Journal ArticleDOI

Mixture densities, maximum likelihood, and the EM algorithm

Richard A. Redner, +1 more
- 01 Apr 1984 - 
TL;DR: This work discusses the formulation and theoretical and practical properties of the EM algorithm, a specialization to the mixture density context of a general algorithm used to approximate maximum-likelihood estimates for incomplete data problems.
Proceedings Article

The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions

TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.
Journal ArticleDOI

Statistical-model-based speech enhancement systems

TL;DR: A unified statistical approach for the three basic problems of speech enhancement is developed, using composite source models for the signal and noise and a fairly large set of distortion measures.
Proceedings Article

Audio-Visual Sound Separation Via Hidden Markov Models

TL;DR: A method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone is proposed, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information.
Related Papers (5)