scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Book ChapterDOI
14 Jan 2009
TL;DR: The proposed discriminative alignment algorithm outperform the state-of-the-art systems on the TIMIT corpus and reports experimental results comparing the proposed algorithm to previous studies on forced alignment, which use hidden Markov models.
Abstract: We describe and analyze a discriminative algorithm for learning to align a phoneme sequence of a speech utterance with its acoustical signal counterpart by predicting a timing sequence representing the phoneme start times In contrast to common HMM-based approaches, our method employs a discriminative learning procedure in which the learning phase is tightly coupled with the forced alignment task The alignment function we devise is based on mapping the input acoustic-symbolic representations of the speech utterance along with the target timing sequence into an abstract vector space We suggest a specific mapping into the abstract vector-space which utilizes standard speech features (eg spectral distances) as well as confidence outputs of a frame-based phoneme classifier Generalizing the notion of separation with a margin used in support vector machines (SVM) for binary classification, we cast the learning task as the problem of finding a vector in an abstract inner-product space We set the prediction vector to be the solution of a minimization problem with a large set of constraints Each constraint enforces a gap between the projection of the correct target timing sequence and the projection of an alternative, incorrect, timing sequence onto the vector Though the number of constraints is very large, we describe a simple iterative algorithm for efficiently learning the vector and analyze the formal properties of the resulting learning algorithm We report experimental results comparing the proposed algorithm to previous studies on forced alignment, which use hidden Markov models (HMM) The results obtained in our experiments using the discriminative alignment algorithm outperform the state-of-the-art systems on the TIMIT corpus

2 citations

Book ChapterDOI
04 Dec 2012
TL;DR: Experimental results obtained from the proposed NRA framework indicate a reasonable improvement over correlation, subspace and standard minimum variance beam forming methods.
Abstract: Distant speech recognition over microphone arrays is challenging, especially in multi source environments. In this paper, a non reference anchor array (NRA) framework for distant speech recognition is proposed. The NRA framework uses a non reference anchor array to capture the interfering speech sources, in addition to the primary array that captures the speech source of interest. The framework uses a linearly constrained minimum variance beam former (LC-MV) beam former such that the signal coming from the look direction is preserved while rejecting correlated interferences coming from the same direction as the source of interest. The performance of the proposed method discussed herein is evaluated by conducting experiments on clean speech acquisition from distant microphones and also on distant speech recognition on the TIMIT and MONC databases. Experimental results obtained from the proposed method indicate a reasonable improvement over correlation, subspace and standard minimum variance beam forming methods.

2 citations

Proceedings ArticleDOI
06 Jun 2021
TL;DR: In this paper, a layerwise recurrence without the assumptions of previous work is proposed, which leads to a standard recurrence with modest modifications to reflect use of log-probabilities.
Abstract: We summarise previous work showing that the basic sigmoid activation function arises as an instance of Bayes’s theorem, and that recurrence follows from the prior. We derive a layerwise recurrence without the assumptions of previous work, and show that it leads to a standard recurrence with modest modifications to reflect use of log-probabilities. The resulting architecture closely resembles the Li-GRU which is the current state of the art for ASR. Although the contribution is mainly theoretical, we show that it is able to outperform the state of the art on the TIMIT and AMI datasets.

2 citations

Book ChapterDOI
29 Jul 2013
TL;DR: The obtained results on corrupted TIMIT database confirm the superiority of fused system in noisy environments against each system alone, and the drastic degradation of the performances of PCA based systems in the presence of environmental noise.
Abstract: This paper evaluates the impact of low-level features on speaker verification performance, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on) stand-alone as features or followed by PCA as linear projection technique applied before the GMM-UBM back-end classifier in clean and noisy environments. The performances of the MFCC-asymmetric features are compared with: the standard Mel-Frequency Cepstral Coefficients (MFCC) that extracted from TIMIT corpus, under clean and noisy conditions. A score level fusion framework based on simples linear methods such as min, max, sum, …, etc. and training methods like SVM is proposed to improve performance and to mitigate noise degradation. The obtained results on corrupted TIMIT database confirm the superiority of fused system in noisy environments against each system alone, and the drastic degradation of the performances of PCA based systems in the presence of environmental noise.

2 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: The proposed VFR scheme is capable of tracking the evolution of speech due to the underlying phonetic content, and exploiting the non-uniform information flow-rate of speech by using a variable framing strategy.
Abstract: In this paper, we propose a new scheme for variable frame rate (VFR) feature processing based on high level segmentation (HLS) of speech into broad phone classes. Traditional fixed-rate processing is not capable of accurately reflecting the dynamics of continuous speech. On the other hand, the proposed VFR scheme adapts the temporal representation of the speech signal by tying the framing strategy with the detected phone class sequence. The phone classes are detected and segmented by using appropriately trained phonological features (PFs). In this manner, the proposed scheme is capable of tracking the evolution of speech due to the underlying phonetic content, and exploiting the non-uniform information flow-rate of speech by using a variable framing strategy. The new VFR scheme is applied to automatic speech recognition of TIMIT and NTIMIT corpora, where it is compared to a traditional fixed window-size/frame-rate scheme. Our experiments yield encouraging results with relative reductions of 24% and 8% in WER (word error rate) for TIMIT and NTIMIT tasks, respectively.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895