scispace - formally typeset
Proceedings ArticleDOI

Statistics based features for unvoiced sound classification

Reads0
Chats0
TLDR
This work investigates if statistics obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values can be used as features for classifying unvoiced sounds.
Abstract
Unvoiced phonemes have significant presence in spoken English language. These phonemes are hard to classify, due to their weak energy and lack of periodicity. Sound textures such as sound made by a flowing stream of water or falling droplets of rain have similar aperiodic properties in temporal domain as unvoiced phonemes. These sounds are easily differentiated by a human ear. Recent studies on sound texture analysis and synthesis have shown that the human auditory system perceives sound textures using simple statistics. These statistics are obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values. In this work we investigate if the above mentioned statistics, which are easy to extract, can also be used as features for classifying unvoiced sounds. To incorporate the moments and correlation values as features, a framework containing multiple classifiers is proposed. Experiments conducted on the TIMIT dataset gave an accuracy on par with the latest reported in the literature, with lesser computational cost.

read more

Citations
More filters
Proceedings ArticleDOI

Spectral and textural features for automatic classification of fricatives

TL;DR: Two dimensionality reduction algorithms, namely, t-distributed Stochastic Neighbor Embedding and Sequential Forward Floating Selection were used to obtain a compact representation of the data and it is shown that representing the data by a feature vector with as few as 3 dimensions, yields a classification rate of almost 90% which outperforms most of the results obtained in previous studies.
Patent

Online forecasting method for high-frequency mechanical noise of structure

TL;DR: In this paper, an online forecasting method for high-frequency mechanical noise of a structure, and belongs to the technical field of noise forecasting, has been revealed, which can be applied to the online forecasting engineering practice, and has a wide application prospect.

Piano Multi-Pitch Estimator Using CNN-Stacked LSTM

TL;DR: This research obtains a model that can perform pitch estimation with a 90.14% F1 score and an average user evaluation of 8.4 out of 10.
Proceedings ArticleDOI

Piano Multi-Pitch Estimator Using CNN-Stacked LSTM

TL;DR: In this article , a combination of Convolutional Neural Network (ConvNet) and Long Short-Term Memory (LSTM) neural network was used to perform pitch estimation with a 90.14% F1 score and an average user evaluation of 8.4 out of 10.
References
More filters
BookDOI

Analysis and synthesis of sound textures

TL;DR: In this paper, sound textures are treated as two-level phenomena: simple sound elements called atoms form the low level, and the distribution and arrangement of atoms forming the high level.
Proceedings ArticleDOI

Classifying soundtracks with audio texture features

TL;DR: It is shown that the texture statistics perform as well as the best conventional statistics (based on MFCC covariance) and the relative contributions of the different statistics are examined, showing the importance of modulation spectra and cross-band envelope correlations.
Book ChapterDOI

Selective Phoneme Spotting for Realization of an /s, z, C, t/ Transposer

TL;DR: Hearing impaired people with severe sensory deficit urgently need a perception-based replacement for inaudible fricational features of /s, z, C, t/ to restore high-level breakdown of speech connectedness.
Related Papers (5)