Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
14 Mar 2010TL;DR: An initial attempt for phoneme recognition using structured SVM is presented, which was able to offer an absolute performance improvement of 1.33% over HMMs even with a highly simplified initial approach, probably because of the concept of maximized margin of SVM.
Abstract: Structured Support Vector Machine (SVM) is a recently developed extension of the very successful SVM approach, which can efficiently classify structured pattern with maximized margin This paper presents an initial attempt for phoneme recognition using structured SVM We simply learn the basic framework of HMMs in configuring the structured SVM In the preliminary experiments with TIMIT corpus, the proposed approach was able to offer an absolute performance improvement of 133% over HMMs even with a highly simplified initial approach, probably because of the concept of maximized margin of SVM We see the potential of this approach because of the high generality, high flexibility, and high power of structured SVM
16 citations
••
09 Jul 2006TL;DR: This paper addresses the problem of unsupervised speaker change detection by testing three systems based on the Bayesian information criterion (BIC), a real-time approach employing the line spectral pairs and the BIC to validate a potential speaker change point.
Abstract: This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian Information Criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method consists of three modules. In the first module, a measure based on second-order statistics is used; in the second module, the Euclidean distance and T2 Hotelling statistic are applied; and in the third module, the BIC is utilized. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. A comparison between the performance of the three systems is made based on t-statistics.
16 citations
••
TL;DR: This paper proposes a new method for feature extraction from the trajectory of the speech signal in the RPS using the multivariate autoregressive (MVAR) method and benefits from linear discriminant analysis (LDA) for dimension reduction.
16 citations
•
TL;DR: Experimental results have shown that the HOPE framework yields significant performance gains over the current state-of-the-art methods in various types of NN learning problems, including unsupervised feature learning, supervised or semi-supervised learning.
Abstract: In this paper, we propose a novel model for high-dimensional data, called the Hybrid Orthogonal Projection and Estimation (HOPE) model, which combines a linear orthogonal projection and a finite mixture model under a unified generative modeling framework. The HOPE model itself can be learned unsupervised from unlabelled data based on the maximum likelihood estimation as well as discriminatively from labelled data. More interestingly, we have shown the proposed HOPE models are closely related to neural networks (NNs) in a sense that each hidden layer can be reformulated as a HOPE model. As a result, the HOPE framework can be used as a novel tool to probe why and how NNs work, more importantly, to learn NNs in either supervised or unsupervised ways. In this work, we have investigated the HOPE framework to learn NNs for several standard tasks, including image recognition on MNIST and speech recognition on TIMIT. Experimental results have shown that the HOPE framework yields significant performance gains over the current state-of-the-art methods in various types of NN learning problems, including unsupervised feature learning, supervised or semi-supervised learning.
16 citations
••
01 Jan 2003TL;DR: This work uses Mutual Information as measure of the usefulness of individual time-frequency cells for various speech classification tasks and shows that selecting input features according to the mutual information criteria can provides a significant increase in classification accuracy.
Abstract: Information concerning the identity of subword units such as phones cannot easily be pinpointed because it is broadly distributed in time and frequency. Continuing earlier work, we use Mutual Information as measure of the usefulness of individual time-frequency cells for various speech classification tasks, usin gt he hand-annotations of the TIMIT database as our ground truth. Since different broad phonetic classes such as vowels and stops have such different temporal characteristics, we examine mutual information separately for each class, revealing structure that was not uncovered in earlier work; further structure is revealed by aligning the time-frequency displays of each phone at the center of their hand-marked segments, rather than averaging across all possible alignments within each segment. Based on these results, we evaluate a range of vowel classifiers over the TIMIT test set and show that selecting input features according to the mutual information criteria can provides a significant increase in classification accuracy.
16 citations