Statistics based features for unvoiced sound classification
TL;DR: This work investigates if statistics obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values can be used as features for classifying unvoiced sounds.
Abstract: Unvoiced phonemes have significant presence in spoken English language. These phonemes are hard to classify, due to their weak energy and lack of periodicity. Sound textures such as sound made by a flowing stream of water or falling droplets of rain have similar aperiodic properties in temporal domain as unvoiced phonemes. These sounds are easily differentiated by a human ear. Recent studies on sound texture analysis and synthesis have shown that the human auditory system perceives sound textures using simple statistics. These statistics are obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values. In this work we investigate if the above mentioned statistics, which are easy to extract, can also be used as features for classifying unvoiced sounds. To incorporate the moments and correlation values as features, a framework containing multiple classifiers is proposed. Experiments conducted on the TIMIT dataset gave an accuracy on par with the latest reported in the literature, with lesser computational cost.
...read more
Citations
4 citations
Cites methods from "Statistics based features for unvoi..."
...For example in [18] a correct identification rate of 84% was reported, using a bark bands spectral representation and a canonical discriminant analysis, while in [19], a correct rate of 86....
[...]
1 citations
References
14,262 citations
291 citations
181 citations
"Statistics based features for unvoi..." refers background in this paper
...Unvoiced sounds, due to their low energy and noise like structure, are hard to recognize....
[...]
73 citations
"Statistics based features for unvoi..." refers background in this paper
...These sounds make upto 21.0% of the total phonemes spoken in English language [1]....
[...]
60 citations
"Statistics based features for unvoi..." refers methods in this paper
...A non-linear manifold learning technique called diffusion maps was advocated to improve the classification accuracy....
[...]