An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

Question

Q1. What are the contributions mentioned in the paper "An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion" ?

Q2. How is the likelihood function of Yt: determined?

Q3. What is the likelihood function of Yt: associated with S:?

Q4. How do the authors add compensation into the corpus speech model?

Q5. What is the noise model used to calculate the likelihood of the measurement?

Q6. How long does the LMS approach take to find the longest noisy segment?

Q7. What is the difference between the smoothed channel and noise estimate?

Q8. What is the effect of the speech 27 gain resolution on the search?

Q9. How many utterances are in the second part of the test data?

Accepted Answer

6 This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion ( i. e., convolutional noise ), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. Second, the authors present an improved method for modeling noise for speech estimation. Third, the authors present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of their enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system.

Accepted Answer

The likelihood of the match between the two segments Yt:τ and 17 Sζ:η is decided through optimizing the parameters g, qk and hk on the segment level assuming 18 stationary noise and constant channel characteristic in the segment.

Accepted Answer

In other words, given a noisy 19 segment Yt:τ , p(Yt:τ |λSζ:η ) indicates the likelihood of the noisy segment with stationary noise and 20 with an accordingly matched corpus segment Sζ:η, subject to a time-invariant channel factor.

Accepted Answer

By introducing channel and noise 34 compensation into the corpus GMM, the authors therefore introduce the compensation into all the corpus 35utterances built on the GMM used for finding the matched segments.

Accepted Answer

This new noise model was used as an alternative to the white noise model - 11 in calculating the likelihood of the measurement in (3) and (5), the noise model of the two which 12 produced a larger likelihood would be used.

Accepted Answer

the authors propose the longest matching segment (LMS) approach: at each time t, the authors 11 find the longest noisy segment from t that can assume stationary noise and has an accordingly 12 matched corpus speech segment, subject to a constant channel factor.

Accepted Answer

The smoothed channel and noise 27 estimates can be used to modify the wideband, clean corpus speech model to reduce the mismatch 28 against the noisy measurement, or used to reduce the level of distortion in the noisy measurement, 29 thereby reducing the error in segment matching.

Accepted Answer

This is subject to the constraint that the power of the model 26 of speech plus noise should not exceed the power of the noisy measurement; the use of the speech 27 gain resolution to quantize the noise gain range for the search reduces the amount of computation 28 for (3) and (5).

Accepted Answer

Like the first part of test data, the second 5 part of test data includes six test sets with both noise and channel distortion, plus one test set 6 without noise and with channel distortion only; each test set contains 330 utterances.

An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

Figures

Citations

Authentication and recovery algorithm for speech signal based on digital watermarking

Speech enhancement based on simple recurrent unit network

Speech Enhancement Based on Full-Sentence Correlation and Clean Speech Recognition

A security watermark scheme used for digital speech forensics

A speech content authentication algorithm based on a novel watermarking method

References

Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

Speech enhancement using a minimum mean square error short-time spectral amplitude estimator

Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains

RASTA processing of speech

Maximum likelihood linear transformations for HMM-based speech recognition

Related Papers (5)

Constrained iterative speech enhancement with application to speech recognition

Speech enhancement by perceptual filter with sequential noise parameter estimation

SURE-MSE speech enhancement for robust speech recognition

Word Graph Based Feature Enhancement for Noisy Speech Recognition

Unified framework for single channel speech enhancement

Frequently Asked Questions (9)