scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
01 Jan 2011
TL;DR: Investigation of multitask learning (MTL) approach for joint estimation of articulatory features with and without phoneme classification as subtask shows that MTL MLP can estimate articulatory feature features compactly and efficiently by learning the inter-feature dependencies through a common hidden layer representation, irrespective of number of subtasks.
Abstract: Speech sounds can be characterized by articulatory features. Articulatory features are typically estimated using a set of multilayer perceptrons (MLPs), i.e., a separate MLP is trained for each articulatory feature. In this report, we investigate multitask learning (MTL) approach for joint estimation of articulatory features with and without phoneme classification as subtask. The effect of number of subtasks in MTL is studied by selecting two different articulatory feature representations. Our studies show that MTL MLP can estimate articulatory features compactly and efficiently by learning the inter-feature dependencies through a common hidden layer representation, irrespective of number of subtasks. Furthermore, adding phoneme as subtask while estimating articulatory features improves both articulatory feature estimation and phoneme recognition. On TIMIT phoneme recognition task, articulatory feature posterior probabilities obtained by MTL MLP achieve a phoneme recognition accuracy of 73.8%, while the phoneme posterior probabilities achieve an accuracy of 74.2%.

4 citations

Proceedings ArticleDOI
29 Oct 2000
TL;DR: Simulation results generally indicate improved separation quality, a higher probability in producing distinct source outputs, and robustness in noisy cases.
Abstract: Techniques for blind separation of mixed speech signals (co-channel speech) have been reported in the literature One computationally simple method for linear mixtures (suitable for real-time separation), employs a gradient search algorithm to maximize the kurtosis of the outputs (hopefully separated speech signals) We report the results of an enhancement to the algorithm which involves a normalization to the correction matrix used in the update of the separation matrix Simulation results (using the TIMIT speech corpus) generally indicate improved (sometimes significantly) separation quality, a higher probability in producing distinct source outputs, and robustness in noisy cases

4 citations

Proceedings ArticleDOI
01 Nov 2009
TL;DR: In this work, Fratio is computed as a theoretical measure to validate the experimental results on speaker recognition and reveals the performance of the proposed algorithm in performing speaker recognition based on minimum distance between test features and clusters.
Abstract: The main objective of this paper is to explore the effectiveness of perceptual features combined with pitch for text independent speaker recognition. The proposed combined features are captured and training models are developed by K-means clustering procedure. Speaker recognition system is evaluated on clean test speeches and the experimental results reveal the performance of the proposed algorithm in performing speaker recognition based on minimum distance between test features and clusters. This algorithm gives the overall accuracy of 99.675% and 98.75% for the combined features and perceptual features respectively for identifying speaker among 8 speakers chosen randomly from 8 different dialect regions in “TIMIT” database. It also gives average accuracy of 96.375% and 95.625% for perceptual linear predictive cepstrum combined with pitch and perceptual linear predictive cepstrum respectively for 8 speakers chosen randomly from the same dialect region. The noteworthy feature of speaker identification algorithm is to evaluate the testing procedure on identical messages for all speakers. In this work, Fratio is computed as a theoretical measure to validate the experimental results on speaker recognition.

4 citations

Proceedings ArticleDOI
01 Dec 2014
TL;DR: The proposed CRF based phoneme sequence recognition approach is capable of achieving performance similar to standard hybrid HMM/ANN and ANN/CRF systems where the ANN is trained with manual segmentation.
Abstract: State-of-the-art phoneme sequence recognition systems are based on hybrid hidden Markov model/artificial neural networks (HMM/ANN) framework. In this framework, the local classifier, ANN, is typically trained using Viterbi expectation-maximization algorithm, which involves two separate steps: phoneme sequence segmentation and training of ANN. In this paper, we propose a CRF based phoneme sequence recognition approach that simultaneously infers the phoneme segmentation and classifies the phoneme sequence. More specifically, the phoneme sequence recognition system consists of a local classifier ANN followed by a conditional random field (CRF) whose parameters are trained jointly, using a cost function that discriminates the true phoneme sequence against all competing sequences. In order to efficiently train such a system we introduce a novel CRF based segmentation using acyclic graph. We study the viability of the proposed approach on TIMIT phoneme recognition task. Our studies show that the proposed approach is capable of achieving performance similar to standard hybrid HMM/ANN and ANN/CRF systems where the ANN is trained with manual segmentation.

4 citations

Journal ArticleDOI
TL;DR: This talk presents phonetic models that capture both the dynamic characteristics and the statistical dependencies of acoustic attributes in a segment‐based framework that compares favorably with other studies using the timit corpus.
Abstract: This talk presents phonetic models that capture both the dynamic characteristics and the statistical dependencies of acoustic attributes in a segment‐based framework. The approach is based on the creation of a track, Tα, for each phonetic unit α. The track serves as a model of the dynamic trajectories of the acoustic attributes over the segment. The statistical framework for scoring incorporates the auto‐ and cross‐correlation properties of the track error over time, within a segment. On a vowel classification task [W. Goldenthal and J. Glass, ‘‘Modeling Spectra Dynamics for Vowel Classification,’’ Proc. Eurospeech 93, pp. 289–292, Berlin, Germany (1993)], this methodology achieved classification performance of 68.9%. This result compares favorably with other studies using the timit corpus. This talk extends this result by presenting context‐independent and context‐dependent experiments for all the phones. Context‐independent classification performance of 76.8% is demonstrated. The key to implementing the...

4 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895