Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement [speech recognition applications]
Citations
534 citations
44 citations
25 citations
Cites background or methods from "Effect of phase-sensitive environme..."
...Since a numerical evaluation of the resulting integrals is computationally very demanding if not almost impossible, the observation probability is approximated by a Gaussian, where the effect of the phase factor is either modelled as a contribution to the mean [4], to the variance [2] or to both mean and variance [3]....
[...]
...However, it is well-known that a more accurate model is obtained if a phase factor α, which results from the unknown phase between the complex speech and noise short-term discrete-time Fourier transform, is taken into account [1, 2, 3, 4]....
[...]
...Subsequently, the observation probability p(y|x,n) can be determined either by Vector Taylor Series approximation up to linear [5] or higher-order terms [3] or by Monte Carlo Integration [4]....
[...]
...[3], we achieved recognition accuracies of 85....
[...]
23 citations
Cites methods from "Effect of phase-sensitive environme..."
...For VTS the cross term can be found using [62, 43]...
[...]
14 citations
References
509 citations
480 citations
41 citations
"Effect of phase-sensitive environme..." refers background in this paper
...One class of techniques that addresses this problem consists of modelbased techniques that either modify the back-end statistical models [1, 2] or compensate the observed acoustic feature vectors using estimates of the clean speech and/or the background (noise) model parameters [3, 4, 5, 6]....
[...]
25 citations
Additional excerpts
...One class of techniques that addresses this problem consists of modelbased techniques that either modify the back-end statistical models [1, 2] or compensate the observed acoustic feature vectors using estimates of the clean speech and/or the background (noise) model parameters [3, 4, 5, 6]....
[...]
...Instead, the back-end acoustic model, which is more detailed than the front-end, can use a larger context in the decision process [6]....
[...]
24 citations
Additional excerpts
...Model-Based Feature Enhancement (MBFE) is a scalable and efficient technique to jointly reduce the interfering additive and convolutional noise from a noisy speech utterance before recognition by an ASR system [7, 9]....
[...]
...The corresponding update formula is given by [9] : δh = ⎡ ⎣∑ t ∑ (i, j ) γ (i, j ) t F ′ (i, j ) ( x(i, j ) )−1 F(i, j ) ⎤ ⎦ −1 ....
[...]
...First, we showed how the phase difference between speech and noise (that is often neglected in the acoustic environment model) gives rise to an additional term in the calculation of the covariance matrices for the noisy speech....
[...]
...The speaker-independent LVCSR-system that has been developed by the ESAT speech group of the K.U.Leuven, is used as a backend recogniser (details can be found in [9])....
[...]