Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
30 May 2002TL;DR: A real-time wideband speech codec adopting a wavelet packet based methodology and adapting the probability model of the quantized coefficients frame by frame by means of a competitive neural network to model better the speech characteristics of the current speaker.
Abstract: We developed a real-time wideband speech codec adopting a wavelet packet based methodology. The transform domain coefficients were first quantized by means of a mid-tread uniform quantizer and then encoded with an arithmetic coding. In the first step the wavelet coefficients were quantized by using a psycho-acoustic model. The second step was carried out by adapting the probability model of the quantized coefficients frame by frame by means of a competitive neural network. The neural network was trained on the TIMIT corpus and his weights updated in real-time during the compression in order to model better the speech characteristics of the current speaker. The coding/decoding algorithm was first written in C and then optimised on the TMS320C6000 DSP platform.
1 citations
••
23 May 2022
TL;DR:
Abstract: Recently, a growing interest in unsupervised learning of disentangled representations has been observed, with successful applications to both synthetic and real data. In speech processing, such methods have been able to disentangle speakers’ attributes from verbal content. To have a better understanding of disentanglement, synthetic data is necessary, as it provides a controllable framework to train models and evaluate disentanglement. Thus, we introduce diSpeech, a corpus of speech synthesized with the Klatt synthesizer. Its first version is constrained to vowels synthesized with 5 generative factors relying on pitch and formants. Experiments show the ability of variational autoencoders to disentangle these generative factors and assess the reliability of disentanglement metrics. In addition to provide a support to benchmark speech disentanglement methods, diSpeech also enables the objective evaluation of disentanglement on real speech, which is to our knowledge unprecedented. To illustrate this methodology, we apply it to TIMIT’s isolated vowels.
1 citations
••
23 Sep 2020TL;DR: It is shown that careful selection of traditional techniques may lead to very satisfying results when it comes to achieved EER values.
Abstract: The aim of this paper is to present some research on speaker verification system based on Gaussian Mixture Model-Universal Background Model (GMM-UBM) approach. All tests were done for the TIMIT corpus. Performance for the standard Mel-Frequency Cepstral Coefficients (MFCC) and dynamic delta features is shown. Influence of feature dimensionality and model complexity on Equal Error Rate (EER) is presented. Additionally, an impact of Voice Activity Detection (VAD) and normalization techniques like Cepstral Mean and Variance Normalization (CMVN) and RelAtive SpecTrA (RASTA) filtering is covered. Each combination of factors was examined. It is shown that careful selection of traditional techniques may lead to very satisfying results when it comes to achieved EER values.
1 citations
••
23 Aug 2010TL;DR: The proposed algorithm considerably improves the prediction ability of the classifier and is modified to minimize the sum of costs for misclassified examples.
Abstract: Our aim in this paper is to propose a rule-weight learning algorithm in fuzzy rule-based classifiers. The proposed algorithm is presented in two modes: first, all training examples are assumed to be equally important and the algorithm attempts to minimize the error-rate of the classifier on the training data by adjusting the weight of each fuzzy rule in the rule-base, and second, a weight is assigned to each training example as the cost of misclassification of it using the class distribution of its neighbors. Then, instead of minimizing the error-rate, the learning algorithm is modified to minimize the sum of costs for misclassified examples. Using six data sets from UCI-ML repository and the TIMIT speech corpus for frame wise phone classification, we show that our proposed algorithm considerably improves the prediction ability of the classifier.
1 citations
••
TL;DR: In this article, Evers et al. presented a method for distinguishing automatically between sibilant fricatives using the slope of regression lines over separate frequency ranges within a DFT spectrum.
Abstract: Acoustic cues to the distinction between sibilant fricatives are claimed to be invariant across languages. Evers et al. (1998) present a method for distinguishing automatically between [s] and [ʃ], using the slope of regression lines over separate frequency ranges within a DFT spectrum. They report accuracy rates in excess of 90% for fricatives extracted from recordings of minimal pairs in English, Dutch and Bengali. These findings are broadly replicated by Maniwa et al. (2009), using VCV tokens recorded in the lab. We tested the algorithm from Evers et al. (1998) against tokens of fricatives extracted from the TIMIT corpus of American English read speech, and the Kiel corpora of German. We were able to achieve similar accuracy rates to those reported in previous studies, with the following caveats: (1) the measure relies on being able to perform a DFT for frequencies from 0 to 8 kHz, so that a minimum sampling rate of 16 kHz is necessary for it to be effective, and (2) although the measure draws a simila...
1 citations