Proceedings ArticleDOI
Spectral and textural features for automatic classification of fricatives
Alex Frid,Yizhar Lavner +1 more
- pp 1-4
Reads0
Chats0
TLDR
Two dimensionality reduction algorithms, namely, t-distributed Stochastic Neighbor Embedding and Sequential Forward Floating Selection were used to obtain a compact representation of the data and it is shown that representing the data by a feature vector with as few as 3 dimensions, yields a classification rate of almost 90% which outperforms most of the results obtained in previous studies.Abstract:
Classification of unvoiced fricatives is an important stage in applications such as spoken term detection and audio-video synchronization, and in technologies for the hearing impaired Due to their acoustic similarity, extraction of multiple features and construction of high-dimensional feature vectors are required for successful classification of these phonemes In this study two dimensionality reduction algorithms, namely, t-distributed Stochastic Neighbor Embedding (t-SNE) and Sequential Forward Floating Selection (SFFS) were used to obtain a compact representation of the data A classification stage (kNN or SVM) was then applied, in which we compared the identification rates between the original feature vector and the low-dimensional representation A total of 1000 unvoiced fricatives (/s/ /sh/ /f/ and /th/) derived from the TIMIT speech database, containing 25000 short frames of 8 ms each, were used for the evaluation We show that representing the data by a feature vector with as few as 3 dimensions, yields a classification rate of almost 90% which outperforms most of the results obtained in previous studiesread more
Citations
More filters
Book ChapterDOI
Baby Cry Detection: Deep Learning and Classical Approaches
TL;DR: This chapter designs and evaluates several convolutional neural network architectures for baby cry detection, and compares their performance to that of classical machine-learning approaches, such as logistic regression and support vector machines.
Dissertation
Clustering classification and human perception of automative steering wheel transient vibrations
TL;DR: The results obtained allowed us to assess the importance of knowing the carrier and removal status of canine coronavirus, as a source of infection for other animals, not necessarily belonging to the same breeds.
Journal ArticleDOI
Analyzing cognitive processes from complex neuro-physiologically based data: some lessons
TL;DR: This paper describes the experience working on several examples at the edge of capabilities of machine learning systems and describes the various and variant methodologies needed to overcome these sort of challenges.
Book ChapterDOI
Analysis of the Influence of the Arabic Fricatives Vocalic Context on Their Spectral Parameters
TL;DR: In this paper, an analysis of the right vowel context influence on their spectrum of fourteen Arabic fricative was presented in a system based on Neural Network (Multilayer Perceptron MLP).
References
More filters
Journal Article
Visualizing Data using t-SNE
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI
Floating search methods in feature selection
TL;DR: Sequential search methods characterized by a dynamically changing number of features included or eliminated at each step, henceforth "floating" methods, are presented and are shown to give very good results and to be computationally more effective than the branch and bound method.
Book
The Sounds of the World's Languages
Peter Ladefoged,Ian Maddieson +1 more
TL;DR: The Sounds of the Worlda s Languages as discussed by the authors is a collection of the world languages spoken in the Middle East and North Africa, including Arabic, French, German, Italian, and Dutch.
Proceedings Article
Stochastic Neighbor Embedding
Geoffrey E. Hinton,Sam T. Roweis +1 more
TL;DR: This probabilistic framework makes it easy to represent each object by a mixture of widely separated low-dimensional images, which allows ambiguous objects, like the document count vector for the word "bank", to have versions close to the images of both "river" and "finance" without forcing the image of outdoor concepts to be located close to those of corporate concepts.
Journal ArticleDOI
Speech database development at MIT: Timit and beyond
TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.