Statistics based features for unvoiced sound classification

doi:10.1109/MLSP.2013.6661986

Home
/
Papers
/
Statistics based features for unvoiced sound classification

Proceedings Article•DOI•

Statistics based features for unvoiced sound classification

Sunit Sivasankaran¹, K. M. M. Prabhu¹•Institutions (1)

Indian Institute of Technology Madras¹

14 Nov 2013-pp 1-6

TL;DR: This work investigates if statistics obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values can be used as features for classifying unvoiced sounds.

read less

Abstract: Unvoiced phonemes have significant presence in spoken English language. These phonemes are hard to classify, due to their weak energy and lack of periodicity. Sound textures such as sound made by a flowing stream of water or falling droplets of rain have similar aperiodic properties in temporal domain as unvoiced phonemes. These sounds are easily differentiated by a human ear. Recent studies on sound texture analysis and synthesis have shown that the human auditory system perceives sound textures using simple statistics. These statistics are obtained by decomposing sounds using a set of filter-banks and computing the moments of the filter responses, along with their correlation values. In this work we investigate if the above mentioned statistics, which are easy to extract, can also be used as features for classifying unvoiced sounds. To incorporate the moments and correlation values as features, a framework containing multiple classifiers is proposed. Experiments conducted on the TIMIT dataset gave an accuracy on par with the latest reported in the literature, with lesser computational cost.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Spectral and textural features for automatic classification of fricatives

[...]

Alex Frid¹, Yizhar Lavner¹•Institutions (1)

Tel-Hai Academic College¹

11 Apr 2014

TL;DR: Two dimensionality reduction algorithms, namely, t-distributed Stochastic Neighbor Embedding and Sequential Forward Floating Selection were used to obtain a compact representation of the data and it is shown that representing the data by a feature vector with as few as 3 dimensions, yields a classification rate of almost 90% which outperforms most of the results obtained in previous studies.

...read moreread less

Abstract: Classification of unvoiced fricatives is an important stage in applications such as spoken term detection and audio-video synchronization, and in technologies for the hearing impaired Due to their acoustic similarity, extraction of multiple features and construction of high-dimensional feature vectors are required for successful classification of these phonemes In this study two dimensionality reduction algorithms, namely, t-distributed Stochastic Neighbor Embedding (t-SNE) and Sequential Forward Floating Selection (SFFS) were used to obtain a compact representation of the data A classification stage (kNN or SVM) was then applied, in which we compared the identification rates between the original feature vector and the low-dimensional representation A total of 1000 unvoiced fricatives (/s/ /sh/ /f/ and /th/) derived from the TIMIT speech database, containing 25000 short frames of 8 ms each, were used for the evaluation We show that representing the data by a feature vector with as few as 3 dimensions, yields a classification rate of almost 90% which outperforms most of the results obtained in previous studies

...read moreread less

4 citations

Cites methods from "Statistics based features for unvoi..."

...For example in [18] a correct identification rate of 84% was reported, using a bark bands spectral representation and a canonical discriminant analysis, while in [19], a correct rate of 86....
[...]

Patent•

Online forecasting method for high-frequency mechanical noise of structure

[...]

Sheng Meiping, Ma Jian Gang, Wang Minqing, Guo Zhiwei

23 Dec 2015

TL;DR: In this paper, an online forecasting method for high-frequency mechanical noise of a structure, and belongs to the technical field of noise forecasting, has been revealed, which can be applied to the online forecasting engineering practice, and has a wide application prospect.

...read moreread less

Abstract: The invention discloses an online forecasting method for high-frequency mechanical noise of a structure, and belongs to the technical field of noise forecasting. The method comprises the following steps: building a reasonable and effective constraint and load statistical energy analysis model aiming at an engineering practice structure; obtaining quality data of various stimulated sub-systems by the built statistical energy analysis model; testing vibration response data of the stimulated sub-systems through a test; calculating energy data of the stimulated sub-systems by combining the quality data with the response data; obtaining radiated sound power energy mechanical mobility data corresponding to the stimulated sub-systems according to the model; and finally calculating the radiated sound power of the structure, and finishing online forecasting. The online forecasting method has the beneficial effects that rapid calculation from load to radiated sound power is achieved by the system transfer mobility invariability; the problem of relatively long elapsed time of the traditional algorithm is solved; and rapid forecasting of mechanical noise of the structure is achieved. The online forecasting method is considerable in accuracy and relatively short in elapsed time, can be applied to the online forecasting engineering practice, and has a wide application prospect.

...read moreread less

1 citations

DOI•

Piano Multi-Pitch Estimator Using CNN-Stacked LSTM

[...]

Christhopher Ravian Hartono, Salim Hartono, Benyamin Budiharja, Ivan Sebastian Edbert, Derwin Suhartono - Show less +1 more

20 Jul 2022

TL;DR: This research obtains a model that can perform pitch estimation with a 90.14% F1 score and an average user evaluation of 8.4 out of 10.

...read moreread less

Abstract: This research explores several variations of the automatic music transcription method, specifically in the pitch estimation task. Pitch estimation in this research mainly converts an acoustic piano song recording into a digitally transcribed song format. First, several techniques, including short-time Fast-Fourier transform and constant-Q transform, provide a spectrogram representation of a wav piano recording. Then it is fed into a combination of Convolutional Neural Network (ConvNet) and Long Short-Term Memory (LSTM) neural network. This transcription is a digitally transcribed song in the format of a MIDI file. For training purposes, the MAESTRO dataset was used for conducting training, which every training varies the learning rate value and spectrogram representation. This research obtains a model that can perform pitch estimation with a 90.14% F1 score and an average user evaluation of 8.4 out of 10.

...read moreread less

Proceedings Article•DOI•

Piano Multi-Pitch Estimator Using CNN-Stacked LSTM

[...]

20 Jul 2022

TL;DR: In this article , a combination of Convolutional Neural Network (ConvNet) and Long Short-Term Memory (LSTM) neural network was used to perform pitch estimation with a 90.14% F1 score and an average user evaluation of 8.4 out of 10.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

[...]

Yoav Freund¹, Robert E. Schapire¹•Institutions (1)

AT&T Labs¹

01 Aug 1997

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

...read moreread less

Abstract: In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in Rn. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.

...read moreread less

15,813 citations

Journal Article•DOI•

Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis

[...]

Josh H. McDermott¹, Eero P. Simoncelli¹, Eero P. Simoncelli², Eero P. Simoncelli³•Institutions (3)

Howard Hughes Medical Institute¹, Courant Institute of Mathematical Sciences², Center for Neural Science³

08 Sep 2011-Neuron

TL;DR: The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations, and the synthesis methodology offers a powerful tool for their further investigation.

...read moreread less

342 citations

Book•

Relativ frequency of English speech sounds

[...]

Godfrey Dewey

01 Jan 1923

183 citations

"Statistics based features for unvoi..." refers background in this paper

...Unvoiced sounds, due to their low energy and noise like structure, are hard to recognize....
[...]

Journal Article•DOI•

Segregation of unvoiced speech from nonspeech interference

[...]

Guoning Hu¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

05 Aug 2008-Journal of the Acoustical Society of America

TL;DR: Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.

...read moreread less

Abstract: Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.

...read moreread less

74 citations

"Statistics based features for unvoi..." refers background in this paper

...These sounds make upto 21.0% of the total phonemes spoken in English language [1]....
[...]

Journal Article•DOI•

Acoustic-phonetic features for the automatic classification of fricatives

[...]

Ahmed Mohamed Abdelatty Ali¹, Jan Van der Spiegel, Paul Mueller•Institutions (1)

University of Pennsylvania¹

05 Jun 2001-Journal of the Acoustical Society of America

TL;DR: A statistically guided, knowledge-based, acoustic-phonetic system for the automatic classification of fricatives in speaker-independent continuous speech is proposed, which uses an auditory-based front-end processing system and incorporates new algorithms for the extraction and manipulation of the acoustic- phonetic features that proved to be rich in their information content.

...read moreread less

Abstract: In this article, the acoustic-phonetic characteristics of the American English fricative consonants are investigated from the automatic classification standpoint. The features studied in the literature are evaluated and new features are proposed. To test the value of the extracted features, a statistically guided, knowledge-based, acoustic-phonetic system for the automatic classification of fricatives in speaker-independent continuous speech is proposed. The system uses an auditory-based front-end processing system and incorporates new algorithms for the extraction and manipulation of the acoustic-phonetic features that proved to be rich in their information content. Classification experiments are performed using hard-decision algorithms on fricatives extracted from the TIMIT database continuous speech of 60 speakers (not used in the design/training process) from seven different dialects of American English. An accuracy of 93% is obtained for voicing detection, 91% for place of articulation detection, and 87% for the overall classification of fricatives.

...read moreread less

61 citations