scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Book ChapterDOI
01 Jan 1998
TL;DR: In this paper, the authors developed two methods to solve a class of noise reduction and signal reconstruction problems in a computationally efficient way, which deal with compression nonlinearities, a quantitative characterization of frames of translates, irregular sampling theorems, and generalized spatio-temporal Laplacians.
Abstract: . Frames were introduced by Duffin and Schaeffer in 1952 to deal with problems in nonharmonic Fourier series, and have been used more recently in signal analysis. Frames provide a useful start-ing point to obtain signal reconstruction for signals embedded in certain noises. We develop two methods to solve a class of noise reduction and signal reconstruction problems in a computationally efficient way. These methods go beyond the elementary properties of frames; and they deal with compression nonlinearities, a quantitative characterization of frames of translates, irregular sampling theorems, and generalized spatio-temporal Laplacians. Applications of these methods are made on EEG, ECoG, TIMIT, and MRI data.

7 citations

Proceedings ArticleDOI
20 Mar 2016
TL;DR: This work proposes here to calculate distance in articulatory space, and shows the articulatory cost is preferred at a rate of 58% compared to the standard Multisyn acoustic join cost.
Abstract: Join cost calculation has so far dealt exclusively with acoustic speech parameters, and a large number of distance metrics have previously been tested in conjunction with a wide variety of acoustic parameterisations. In contrast, we propose here to calculate distance in articulatory space. The motivation for this is simple: physical constraints mean a human talker's mouth cannot "jump" from one configuration to a different one, so smooth evolution of articulator positions would also seem desirable for a good candidate unit sequence. To test this, we built Festival Multisyn voices using a large articulatory-acoustic dataset. We first synthesised 460 TIMIT sentences and confirmed our articulatory join cost gives appreciably different unit sequences compared to the standard Multisyn acoustic join cost. A listening test (3 sets of 25 sentence pairs, 30 listeners) then showed our articulatory cost is preferred at a rate of 58% compared to the standard Multisyn acoustic join cost.

7 citations

Proceedings ArticleDOI
01 Sep 2006
TL;DR: Results are presented for two speech processing tasks for BP: phone classification and grapheme to phoneme (G2P) conversion.
Abstract: Speech processing is a data-driven technology that relies on public corpora and associated resources. In contrast to languages such as English, there are few resources for Brazilian Portuguese (BP). Consequently, there are no publicly available scripts to design baseline BP systems. This work discusses some efforts towards decreasing this gap and presents results for two speech processing tasks for BP: phone classification and grapheme to phoneme (G2P) conversion. The former task used hidden Markov models to classify phones from the Spoltech and TIMIT corpora. The G2P module adopted machine learning methods such as decision trees and was tested on a new BP pronunciation dictionary and the following languages: British English, American English and French.

7 citations

Proceedings Article
01 Sep 1998
TL;DR: Validation of the pruning procedure on 567 speakers leads to a significative improvement on TIMIT and NTIMIT (up to 30% error rate reduction on TIM IT) and a prior frame level likelihood normalization in order to make comparison between frames meaningful.
Abstract: In this paper, we propose a frame selection procedure for text-independent speaker identification Instead of averaging the frame likelihoods along the whole test utterance, some of these are rejected (pruning) and the final score is computed with a limited number of frames This pruning stage requires a prior frame level likelihood normalization in order to make comparison between frames meaningful This normalization procedure alone leads to a significative performance enhancement As far as pruning is concerned, the optimal number of frames pruned is learned on a tuning data set for normal and telephone speech Validation of the pruning procedure on 567 speakers leads to a significative improvement on TIMIT and NTIMIT (up to 30% error rate reduction on TIMIT)

7 citations

Book ChapterDOI
01 Sep 2013
TL;DR: A combined model which allows the training of the feature extraction filters along with a neural net classifier that in practice always outperforms the standard two-step method and can be significantly improved by using a convolutional version of the network.
Abstract: In speech recognition, spectro-temporal feature extraction and the training of the acoustical model are usually performed separately. To improve recognition performance, we present a combined model which allows the training of the feature extraction filters along with a neural net classifier. Besides expecting that this joint training will result in a better recognition performance, we also expect that such a neural net can generate coefficients for spectro-temporal filters and also enhance preexisting ones, such as those obtained with the two-dimensional Discrete Cosine Transform (2D DCT) and Gabor filters. We tested these assumptions on the TIMIT phone recognition task. The results show that while the initialization based on the 2D DCT or Gabor coefficients is better in some cases than with simple random initialization, the joint model in practice always outperforms the standard two-step method. Furthermore, the results can be significantly improved by using a convolutional version of the network.

7 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895