Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise
Summary (2 min read)
Introduction
- Experiments in [28] found that handgrip strength is a predictor of mortality and morbidity, in man and woman, predicting up to 5 diseases.
- The grip strength is correlated to the overall body strength, muscles and health status.
GGBMI = =
- This formula would predict a maximum normal weight of 87kg for somebody with handgrip strength of 108kg, a height of 1.8m.
- I believe this is correct in general for boxers and wrestlers and gymnasts and it predicts normal weight in the sense of normal body fat percentage, but the competitive weight of world class gymnasts is lower than the weight predicted by this formula.
- For a 1.7 m athlete with handgrip strength of 108lg this formula would predict a maximum normal weight of 79kg, which is reasonable, but of course the optimal competitive weight may be lower.
- Even at 79kg such athlete would not be fat or overweight.
- It is possible to find an even more general formula.
GGBMI =
- It is possible to develop a optimal weight equation using 1.8 instead of 2, the reason is explained in [2].
- Any of these formulae is better than the BMI alone and there is a lot of evidence, in some cases it is better by a large margin.
- Normalization could be obtained through division by 100 and the authors obtain a smaller factor related to the grip strength.
- I consider the previous formula better, but there is also the possibility of division by 100 instead by division with 54 and then.
AGGBMI =
- Engineering an optimal formula is also achieved through trial and error.
- I developed and tested also formula developed based on similar principles such as AGBMI= weight H2+ gripstrengt − 54 weight weight H1.8+ gripstrengt − 54 weight weight H2+ grip_ strength − 54 100 weight 2 × chest −.
AGBMI =
- Therefore it is possible to develop and test the following formulae: Weight Height2× chest −.
- For strength on this move equivalent to 50kg, the maximum predicted weight is 84, for 60 kg is 89kg, for 80 kg strength, the maximum normal weight would be according to this formula 97kg.
SGBMI =
- This formula would predict as much as 119kg maximum normal weight for a 120kg bar crunch.
- Of course, it would be correct to use a force not weight but this is how bars are sold, and the authors can make an equivalent to a lifting move such as dumbbell bench press where the weights are measured in kg.
- The advantage of this move is that could be tested with very simple equipment, a short 50cm bar in a medical office or at home.
- This formula has the BMI as a particular case but works in both ways, for stronger people it allows higher weigh but for weaker people it allows less weight than the classic BMI.
- It is possible to define SABMI = Strength and anthropometric generalization of BMI.
SAGBMI =
- In the same way I develop a number of formulae based on some ideas, experiments cited and principles I developed, them test the formulae with test cases, simulate it and present it so that people who design experimental studies can verify these formulae in a large number of cases, on statistical basis.
- A treatise on Man and the Development of His Faculties.
Did you find this useful? Give us your feedback
Citations
115 citations
Cites methods from "Mel cepstral coefficient modificati..."
...The first two Mel cepstral coefficients were modified (excluding the log-energy coefficient) in order to maximise intelligibility of speech in noise as given by an approximated version of the glimpse proportion measure (Cooke, 2006; Valentini-Botinhao et al., 2012a)....
[...]
...To create the ‘TTSGP’ type a Mel cepstral coefficient modification method (Valentini-Botinhao et al., 2012b) was applied to the spectral parameters generated by the TTS type....
[...]
...…audio power reallocation based on the Speech Intelligibility Index (Sauert and Vary, 2010, 2011) or glimpse proportion (Tang and Cooke, 2012), cepstral extraction based on the glimpse proportion measure (Valentini-Botinhao et al., 2012a), and the insertion of small pauses (Tang and Cooke, 2011)....
[...]
73 citations
Cites methods from "Mel cepstral coefficient modificati..."
...To enhance the spectral envelope a noise-dependent optimisation based on the glimpse proportion measure was performed [29]....
[...]
28 citations
Cites methods from "Mel cepstral coefficient modificati..."
...We then proposed a method to extract cepstral coefficients which maximized the GP measure (Valentini-Botinhao et al., 2012a)....
[...]
...Our solution to this was to modify the generated speech instead (Valentini-Botinhao et al., 2012b), by modifying the Mel cepstral coefficients....
[...]
26 citations
References
248 citations
114 citations
"Mel cepstral coefficient modificati..." refers background in this paper
...One way in which this can be done is by using an intelligibility measure of speech [2]....
[...]
51 citations
39 citations
"Mel cepstral coefficient modificati..." refers background in this paper
...We have observed that the Glimpse Proportion (GP) measure for speech intelligibility in noise [3] has a high correlation coefficient with subjective intelligibility scores for HMMgenerated synthetic speech whose spectral envelope has been modified [4]....
[...]
14 citations
"Mel cepstral coefficient modificati..." refers background or methods in this paper
...In [5] we showed how to approximate the GP measure in a way that provides a closed and differentiable formulation:...
[...]
...As predicted by our hypothesis that distortions were defeating potential gains in intelligibility in our previously-published experiments [5], the voices where we modify only the first few Mel cepstral coefficients achieved a better WAR, indicating that very fine frequency modifications cause distortions that cancel out any potential intelligibility gain they may offer....
[...]
...We then proposed a cepstral extraction method based on the GP measure for the HMM-based synthesis framework [5]....
[...]
Related Papers (5)
Frequently Asked Questions (10)
Q2. What are the future works in "Mel cepstral coefficient modification based on the glimpse proportion measure for improving the intelligibility of hmm- generated synthetic speech in noise" ?
In future, the authors plan to investigate reallocating energy across time. The authors also plan operating under a loudness constraint rather than an energy one.
Q3. How did the authors use the LTAS for the Lombard voice?
The authors used as stopping criteria both error convergence and a maximum distortion threshold set to be 10% of relative increase in the Euclidian distance between the STEP representation of original and modified speech.
Q4. How did the authors extract the Mel cepstral coefficients?
To train, adapt and generate speech the authors extracted: 59 Mel cepstral coefficients with α = 0.77, Mel scale F0, and 25 aperiodicity energy bands extracted using STRAIGHT [8].
Q5. How many speakers did the authors use for the listening test?
For the listening test the authors used 32 native English speakers listening to the noisy samples over headphones in soundproof booths and typing in what he or she heard.
Q6. What is the GP measure for speech intelligibility in noise?
The Glimpse Proportion (GP) measure for speech intelligibility in noise [3] is the proportion of spectral-temporal regions called glimpses where speech is more energetic than noise.
Q7. What can be done to improve intelligibility in noise?
If such data are not available, then the authors can apply noise-independent modifications at the feature level based on known acoustic properties of Lombard speech, such as F0 increase, flattening of spectral tilt and duration stretch [1].
Q8. What are the spectral parameters that define the excitation signal?
The intelligibility gains obtained by the full Lombard voice L over the N-L voice reflect the impact of changes in duration patterns, F0 and the aperiodicity parameters that define the excitation signal, as pointed out in Table 2.
Q9. Why did the authors use an average voice model?
The authors decided to use an average voice model rather than building a speaker-dependent voice because the normal speech dataset was not phonetically balanced.
Q10. What is the difference between the Lombard and the N-L voice?
Moreover the authors observed that, for the competing talker, the intelligibility gain obtained by the Lombard voice over the modified voice was mainly due to changes in duration, F0 and excitation parameters.