scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: A waveform based clipping detection algorithm is proposed for naturalistic audio streams and the results show that clipping introduces a nonlinear distortion into clean speech data, which reduces speech quality and performance for speaker recognition.

2 citations

Proceedings ArticleDOI
01 Feb 2018
TL;DR: The proposed split lattice structure based on sonority detection decreased the phone error rates by nearly 0.9 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the state-of-the-art Deep Neural Networks (DNN).
Abstract: Phoneme lattices have been shown to be a good choice to encode in a compact way alternative decoding hypotheses from a speech recognition system. However the optimal phoneme sequence is produced by tracing all the phoneme identities in the lattice. This not only makes the search space of the decoder huge but also the final phoneme sequence may be prone to have false substitutions or insertion errors. In this paper, we introduce the split lattice structures that is generated by splitting the speech frames based on the manner of articulation. Spectral flatness measure (SFM) is exploited to detect the two broad manner of articulation sonorants and non-sonorants. The manner of sonorants includes broadly the vowels, the semivowels and the nasals whereas the fricatives, stop consonants and closures belong to non-sonorants. The conventional way of speech decoder produces one lattice for one test utterance. In our work, we split the speech frames into sonorants and non-sonorants based on SFM knowledge and generate split lattices. The split lattice generated are modified according to the manner of articulation in each split so as to remove the irrelevant phoneme identities in the lattice. For instance, the sonorant lattice is forced to exclude the non-sonorant phoneme identities and hence minimizing false substitutions or insertion errors. The proposed split lattice structure based on sonority detection decreased the phone error rates by nearly 0.9 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the state-of-the-art Deep Neural Networks (DNN).

2 citations

01 Jan 2007
TL;DR: A theoretical validation of the Neural Predictive Coding model is presented in the hypothesis of unnoised signals and gaussian noised signals to show that the classification rates are clearly improved compared to usual methods, in particular regarding phonemes considered difficult to process.
Abstract: and key words In this article, we propose to study a speech coding method applied to the recognition of phonemes. The proposed model (the Neural Predictive Coding, NPC) and its two declinations (NPC-2 and DFE-NPC) is a connectionist model (multilayer perceptron) based on the non linear prediction of the speech signal. We show that it is possible to improve the discriminant capacities of such an encoder with the introduction of signal membership class information as from the coding stage. As such, it fits in with the category of DFE encoders (Discriminant Features Extraction) already proposed in literature. In this study we present a theoretical validation of the model in the hypothesis of unnoised signals and gaussian noised signals. NPC performances are compared to that obtained with traditional methods used to process speech on the Darpa Timit an Ntimit speech bases. Simulations presented here show that the classification rates are clearly improved compared to usual methods, in particular regarding phonemes considered difficult to process. A small vocabulary word recognition experiment is provided to show how NPC features can be used in a more conventional speech ANN-HMM based system approach.

2 citations

Proceedings ArticleDOI
01 Sep 1996
TL;DR: Recognition of voiced speech phonemes is addressed in this paper using features extracted from the bispectrum of the speech signal as a superposition of coupled harmonics, located at frequencies that are multiples of the pitch and modulated by the vocal tract.
Abstract: Recognition of voiced speech phonemes is addressed in this paper using features extracted from the bispectrum of the speech signal. Voiced speech is modeled as a superposition of coupled harmonics, located at frequencies that are multiples of the pitch and modulated by the vocal tract. For this type of signal, nonzero bispectral values are shown to be guaranteed by the estimation procedure employed. The vocal tract frequency response is reconstructed from the bispectrum on a set of frequency points that are multiples of the pitch. An AR model is next fitted on this transfer function. The AR coefficients are used as the feature vector for the subsequent classification step. Any finite dimension vector classifier can be employed at this point. Experiments using the LVQ neural classifier give satisfactory classification scores on real speech data, extracted from the DARPA/TIMIT speech corpus.

2 citations

Proceedings ArticleDOI
21 May 2015
TL;DR: An experiment was carried out on the Frame Distance Array (FDA) algorithm with a main goal of the algorithm parameter tune-up and the best combination of values was chosen based on the observations on the detection rate, the miss rate and the false boundary rate.
Abstract: This work is related to unsupervised automatic speech segmentation. An experiment was carried out on the Frame Distance Array (FDA) algorithm with a main goal of the algorithm parameter tune-up. The experiment was carried out by applying the algorithm on TIMIT corpus and by using MFCC as the speech signal features. The parameters tuned up in this work are the frame length, the frame increment, the number of test frames and the test frame step size. The best combination of values was chosen based on the observations on the detection rate, the miss rate and the false boundary rate. The best parameter tune-up found at 23 ms, 1.5 ms, 9 frames and 2 frames for the frame length, the frame increment, the number of test frames and the test frame step size respectively.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895