scispace - formally typeset
Proceedings ArticleDOI

Spectral and textural features for automatic classification of fricatives

Reads0
Chats0
TLDR
Two dimensionality reduction algorithms, namely, t-distributed Stochastic Neighbor Embedding and Sequential Forward Floating Selection were used to obtain a compact representation of the data and it is shown that representing the data by a feature vector with as few as 3 dimensions, yields a classification rate of almost 90% which outperforms most of the results obtained in previous studies.
Abstract
Classification of unvoiced fricatives is an important stage in applications such as spoken term detection and audio-video synchronization, and in technologies for the hearing impaired Due to their acoustic similarity, extraction of multiple features and construction of high-dimensional feature vectors are required for successful classification of these phonemes In this study two dimensionality reduction algorithms, namely, t-distributed Stochastic Neighbor Embedding (t-SNE) and Sequential Forward Floating Selection (SFFS) were used to obtain a compact representation of the data A classification stage (kNN or SVM) was then applied, in which we compared the identification rates between the original feature vector and the low-dimensional representation A total of 1000 unvoiced fricatives (/s/ /sh/ /f/ and /th/) derived from the TIMIT speech database, containing 25000 short frames of 8 ms each, were used for the evaluation We show that representing the data by a feature vector with as few as 3 dimensions, yields a classification rate of almost 90% which outperforms most of the results obtained in previous studies

read more

Citations
More filters
Book ChapterDOI

Baby Cry Detection: Deep Learning and Classical Approaches

TL;DR: This chapter designs and evaluates several convolutional neural network architectures for baby cry detection, and compares their performance to that of classical machine-learning approaches, such as logistic regression and support vector machines.
Dissertation

Clustering classification and human perception of automative steering wheel transient vibrations

TL;DR: The results obtained allowed us to assess the importance of knowing the carrier and removal status of canine coronavirus, as a source of infection for other animals, not necessarily belonging to the same breeds.
Journal ArticleDOI

Analyzing cognitive processes from complex neuro-physiologically based data: some lessons

TL;DR: This paper describes the experience working on several examples at the edge of capabilities of machine learning systems and describes the various and variant methodologies needed to overcome these sort of challenges.
Book ChapterDOI

Analysis of the Influence of the Arabic Fricatives Vocalic Context on Their Spectral Parameters

TL;DR: In this paper, an analysis of the right vowel context influence on their spectrum of fourteen Arabic fricative was presented in a system based on Neural Network (Multilayer Perceptron MLP).
References
More filters
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

Floating search methods in feature selection

TL;DR: Sequential search methods characterized by a dynamically changing number of features included or eliminated at each step, henceforth "floating" methods, are presented and are shown to give very good results and to be computationally more effective than the branch and bound method.
Book

The Sounds of the World's Languages

TL;DR: The Sounds of the Worlda s Languages as discussed by the authors is a collection of the world languages spoken in the Middle East and North Africa, including Arabic, French, German, Italian, and Dutch.
Proceedings Article

Stochastic Neighbor Embedding

TL;DR: This probabilistic framework makes it easy to represent each object by a mixture of widely separated low-dimensional images, which allows ambiguous objects, like the document count vector for the word "bank", to have versions close to the images of both "river" and "finance" without forcing the image of outdoor concepts to be located close to those of corporate concepts.
Journal ArticleDOI

Speech database development at MIT: Timit and beyond

TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.
Related Papers (5)