scispace - formally typeset
Open AccessJournal ArticleDOI

An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

Ji Ming, +1 more
- 01 Nov 2014 - 
- Vol. 28, Iss: 6, pp 1269-1286
Reads0
Chats0
TLDR
This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality.
About
This article is published in Computer Speech & Language.The article was published on 2014-11-01 and is currently open access. It has received 8 citations till now. The article focuses on the topics: Speech enhancement & Voice activity detection.

read more

Figures
Citations
More filters
Journal ArticleDOI

Authentication and recovery algorithm for speech signal based on digital watermarking

TL;DR: A new compression method for speech signal based on discrete cosine transform is discussed, and the compressed signals obtained are used to tamper recovery, and one block-based large capacity embedding method is explored, which is used for embedding the compressed signal.
Journal ArticleDOI

Speech enhancement based on simple recurrent unit network

TL;DR: A novel speech enhancement method based on the simple recurrent unit (SRU) that achieves significant improvements at training speed and has capability to balance the performance and the training time is proposed.
Journal ArticleDOI

Speech Enhancement Based on Full-Sentence Correlation and Clean Speech Recognition

TL;DR: A novel realization that integrates full-sentence speech correlation with clean speech recognition, formulated as a constrained maximization problem, to overcome the data sparsity problem and be able to significantly outperform conventional methods that use optimized noise tracking.
Journal ArticleDOI

A security watermark scheme used for digital speech forensics

TL;DR: Theoretical analysis and experimental results show that the scheme proposed is inaudible and robust against desynchronization attacks, enhances the security of watermark system and has a good ability for speech forensics.
Journal ArticleDOI

A speech content authentication algorithm based on a novel watermarking method

TL;DR: Theoretical analysis and experiments demonstrate that the proposed speech content authentication algorithm is robust against desynchronization attacks, improves the security, and has a good performance in ability of tampering location.
References
More filters
Journal ArticleDOI

Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

TL;DR: In this article, a system which utilizes a minimum mean square error (MMSE) estimator is proposed and then compared with other widely used systems which are based on Wiener filtering and the "spectral subtraction" algorithm.
Journal Article

Speech enhancement using a minimum mean square error short-time spectral amplitude estimator

TL;DR: This paper derives a minimum mean-square error STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables, which results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise.
Journal ArticleDOI

Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains

TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.
Journal ArticleDOI

RASTA processing of speech

TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
Journal ArticleDOI

Maximum likelihood linear transformations for HMM-based speech recognition

TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion" ?

6 This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion ( i. e., convolutional noise ), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. Second, the authors present an improved method for modeling noise for speech estimation. Third, the authors present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of their enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. 

The likelihood of the match between the two segments Yt:τ and 17 Sζ:η is decided through optimizing the parameters g, qk and hk on the segment level assuming 18 stationary noise and constant channel characteristic in the segment. 

In other words, given a noisy 19 segment Yt:τ , p(Yt:τ |λSζ:η ) indicates the likelihood of the noisy segment with stationary noise and 20 with an accordingly matched corpus segment Sζ:η, subject to a time-invariant channel factor. 

By introducing channel and noise 34 compensation into the corpus GMM, the authors therefore introduce the compensation into all the corpus 35utterances built on the GMM used for finding the matched segments. 

This new noise model was used as an alternative to the white noise model - 11 in calculating the likelihood of the measurement in (3) and (5), the noise model of the two which 12 produced a larger likelihood would be used. 

the authors propose the longest matching segment (LMS) approach: at each time t, the authors 11 find the longest noisy segment from t that can assume stationary noise and has an accordingly 12 matched corpus speech segment, subject to a constant channel factor. 

The smoothed channel and noise 27 estimates can be used to modify the wideband, clean corpus speech model to reduce the mismatch 28 against the noisy measurement, or used to reduce the level of distortion in the noisy measurement, 29 thereby reducing the error in segment matching. 

This is subject to the constraint that the power of the model 26 of speech plus noise should not exceed the power of the noisy measurement; the use of the speech 27 gain resolution to quantize the noise gain range for the search reduces the amount of computation 28 for (3) and (5). 

Like the first part of test data, the second 5 part of test data includes six test sets with both noise and channel distortion, plus one test set 6 without noise and with channel distortion only; each test set contains 330 utterances.