# Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise

...The first two Mel cepstral coefficients were modified (excluding the log-energy coefficient) in order to maximise intelligibility of speech in noise as given by an approximated version of the glimpse proportion measure (Cooke, 2006; Valentini-Botinhao et al., 2012a)....

...To create the ‘TTSGP’ type a Mel cepstral coefficient modification method (Valentini-Botinhao et al., 2012b) was applied to the spectral parameters generated by the TTS type....

...…audio power reallocation based on the Speech Intelligibility Index (Sauert and Vary, 2010, 2011) or glimpse proportion (Tang and Cooke, 2012), cepstral extraction based on the glimpse proportion measure (Valentini-Botinhao et al., 2012a), and the insertion of small pauses (Tang and Cooke, 2011)....

...To enhance the spectral envelope a noise-dependent optimisation based on the glimpse proportion measure was performed [29]....

...We then proposed a method to extract cepstral coefficients which maximized the GP measure (Valentini-Botinhao et al., 2012a)....

...Our solution to this was to modify the generated speech instead (Valentini-Botinhao et al., 2012b), by modifying the Mel cepstral coefficients....

...To train, adapt and generate speech we extracted: 59 Mel cepstral coefficients with α = 0.77, Mel scale F0, and 25 aperiodicity energy bands extracted using STRAIGHT [8]....

...77, Mel scale F0, and 25 aperiodicity energy bands extracted using STRAIGHT [8]....

...A further extension proposed in this paper is the possibility of using this method for Mel cepstral coefficients, which can provide higher speech quality with fewer coefficients [6]....

...We can represent the spectrum by M -th order Mel cepstral coefficients {cm}m=0 in the following manner [6]:...

...where α is a warping factor which can be chosen to represent, for instance, the Mel scale [6]....

