Q2. How is the likelihood function of Yt: determined?
The likelihood of the match between the two segments Yt:τ and 17 Sζ:η is decided through optimizing the parameters g, qk and hk on the segment level assuming 18 stationary noise and constant channel characteristic in the segment.
Q3. What is the likelihood function of Yt: associated with S:?
In other words, given a noisy 19 segment Yt:τ , p(Yt:τ |λSζ:η ) indicates the likelihood of the noisy segment with stationary noise and 20 with an accordingly matched corpus segment Sζ:η, subject to a time-invariant channel factor.
Q4. How do the authors add compensation into the corpus speech model?
By introducing channel and noise 34 compensation into the corpus GMM, the authors therefore introduce the compensation into all the corpus 35utterances built on the GMM used for finding the matched segments.
Q5. What is the noise model used to calculate the likelihood of the measurement?
This new noise model was used as an alternative to the white noise model - 11 in calculating the likelihood of the measurement in (3) and (5), the noise model of the two which 12 produced a larger likelihood would be used.
Q6. How long does the LMS approach take to find the longest noisy segment?
the authors propose the longest matching segment (LMS) approach: at each time t, the authors 11 find the longest noisy segment from t that can assume stationary noise and has an accordingly 12 matched corpus speech segment, subject to a constant channel factor.
Q7. What is the difference between the smoothed channel and noise estimate?
The smoothed channel and noise 27 estimates can be used to modify the wideband, clean corpus speech model to reduce the mismatch 28 against the noisy measurement, or used to reduce the level of distortion in the noisy measurement, 29 thereby reducing the error in segment matching.
Q8. What is the effect of the speech 27 gain resolution on the search?
This is subject to the constraint that the power of the model 26 of speech plus noise should not exceed the power of the noisy measurement; the use of the speech 27 gain resolution to quantize the noise gain range for the search reduces the amount of computation 28 for (3) and (5).
Q9. How many utterances are in the second part of the test data?
Like the first part of test data, the second 5 part of test data includes six test sets with both noise and channel distortion, plus one test set 6 without noise and with channel distortion only; each test set contains 330 utterances.