Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Real-Time Perceptual Coding of Wideband Speech by Competitive Neural Networks

[...]

Eros Pasero¹, A. Montuori¹•Institutions (1)

Polytechnic University of Turin¹

30 May 2002

TL;DR: A real-time wideband speech codec adopting a wavelet packet based methodology and adapting the probability model of the quantized coefficients frame by frame by means of a competitive neural network to model better the speech characteristics of the current speaker.

...read moreread less

Abstract: We developed a real-time wideband speech codec adopting a wavelet packet based methodology. The transform domain coefficients were first quantized by means of a mid-tread uniform quantizer and then encoded with an arithmetic coding. In the first step the wavelet coefficients were quantized by using a psycho-acoustic model. The second step was carried out by adapting the probability model of the quantized coefficients frame by frame by means of a competitive neural network. The neural network was trained on the TIMIT corpus and his weights updated in real-time during the compression in order to model better the speech characteristics of the current speaker. The coding/decoding algorithm was first written in C and then optimised on the TMS320C6000 DSP platform.

...read moreread less

1 citations

Proceedings Article•DOI•

Dispeech: A Synthetic Toy Dataset for Speech Disentangling

[...]

Olivier Zhang, Nicolas Gengembre, Olivier Le Blouch, Damien Lolive

23 May 2022

TL;DR:

...read moreread less

Abstract: Recently, a growing interest in unsupervised learning of disentangled representations has been observed, with successful applications to both synthetic and real data. In speech processing, such methods have been able to disentangle speakers’ attributes from verbal content. To have a better understanding of disentanglement, synthetic data is necessary, as it provides a controllable framework to train models and evaluate disentanglement. Thus, we introduce diSpeech, a corpus of speech synthesized with the Klatt synthesizer. Its first version is constrained to vowels synthesized with 5 generative factors relying on pitch and formants. Experiments show the ability of variational autoencoders to disentangle these generative factors and assess the reliability of disentanglement metrics. In addition to provide a support to benchmark speech disentanglement methods, diSpeech also enables the objective evaluation of disentanglement on real speech, which is to our knowledge unprecedented. To illustrate this methodology, we apply it to TIMIT’s isolated vowels.

...read moreread less

1 citations

Proceedings Article•DOI•

Speaker verification with TIMIT corpus - some remarks on classical methods

[...]

Adam Dustor¹•Institutions (1)

Silesian University of Technology¹

23 Sep 2020

TL;DR: It is shown that careful selection of traditional techniques may lead to very satisfying results when it comes to achieved EER values.

...read moreread less

Abstract: The aim of this paper is to present some research on speaker verification system based on Gaussian Mixture Model-Universal Background Model (GMM-UBM) approach. All tests were done for the TIMIT corpus. Performance for the standard Mel-Frequency Cepstral Coefficients (MFCC) and dynamic delta features is shown. Influence of feature dimensionality and model complexity on Equal Error Rate (EER) is presented. Additionally, an impact of Voice Activity Detection (VAD) and normalization techniques like Cepstral Mean and Variance Normalization (CMVN) and RelAtive SpecTrA (RASTA) filtering is covered. Each combination of factors was examined. It is shown that careful selection of traditional techniques may lead to very satisfying results when it comes to achieved EER values.

...read moreread less

1 citations

Proceedings Article•DOI•

Framewise Phone Classification Using Weighted Fuzzy Classification Rules

[...]

Omid Dehzangi¹, Bin Ma², Eng Siong Chng¹, Haizhou Li¹•Institutions (2)

Nanyang Technological University¹, Institute for Infocomm Research Singapore²

23 Aug 2010

TL;DR: The proposed algorithm considerably improves the prediction ability of the classifier and is modified to minimize the sum of costs for misclassified examples.

...read moreread less

Abstract: Our aim in this paper is to propose a rule-weight learning algorithm in fuzzy rule-based classifiers. The proposed algorithm is presented in two modes: first, all training examples are assumed to be equally important and the algorithm attempts to minimize the error-rate of the classifier on the training data by adjusting the weight of each fuzzy rule in the rule-base, and second, a weight is assigned to each training example as the cost of misclassification of it using the class distribution of its neighbors. Then, instead of minimizing the error-rate, the learning algorithm is modified to minimize the sum of costs for misclassified examples. Using six data sets from UCI-ML repository and the TIMIT speech corpus for frame wise phone classification, we show that our proposed algorithm considerably improves the prediction ability of the classifier.

...read moreread less

1 citations

Journal Article•DOI•

Corpus testing a fricative discriminator: Or, just how invariant is this invariant?

[...]

Philip J. Roberts, Henning Reetz, Aditi Lahiri

23 Oct 2014-Journal of the Acoustical Society of America

TL;DR: In this article, Evers et al. presented a method for distinguishing automatically between sibilant fricatives using the slope of regression lines over separate frequency ranges within a DFT spectrum.

...read moreread less

Abstract: Acoustic cues to the distinction between sibilant fricatives are claimed to be invariant across languages. Evers et al. (1998) present a method for distinguishing automatically between [s] and [ʃ], using the slope of regression lines over separate frequency ranges within a DFT spectrum. They report accuracy rates in excess of 90% for fricatives extracted from recordings of minimal pairs in English, Dutch and Bengali. These findings are broadly replicated by Maniwa et al. (2009), using VCV tokens recorded in the lab. We tested the algorithm from Evers et al. (1998) against tokens of fricatives extracted from the TIMIT corpus of American English read speech, and the Kiel corpora of German. We were able to achieve similar accuracy rates to those reported in previous studies, with the following caveats: (1) the measure relies on being able to perform a DFT for frequencies from 0 to 8 kHz, so that a minimum sampling rate of 16 kHz is necessary for it to be effective, and (2) although the measure draws a simila...

...read moreread less

1 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics