scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
03 Mar 2007
TL;DR: UT-scope data base, and automatic and perceptual an evaluation of Lombard speech in in-set speaker recognition suggest that deeper understanding of cognitive factor involved in perceptual speaker ID offers meaningful insights for further development of automated systems.
Abstract: This paper presents UT-scope data base, and automatic and perceptual an evaluation of Lombard speech in in-set speaker recognition. The speech used for the analysis forms a part of the UT-SCOPE database and consists of sentences from the well-known TIMIT corpus, spoken in the presence of highway, large crowd and pink noise. First, the deterioration of the EER of an in-set speaker identification system trained on neutral and tested with Lombard speech is illustrated. A clear demarcation between the effect of noise and Lombard effect on noise is also given by testing with noisy Lombard speech. The effect of test-token duration on system performance under the Lombard condition is addressed. We also report results from In-Set Speaker Recognition tasks performed by human subjects in comparison to the system performance. Overall observations suggest that deeper understanding of cognitive factor involved in perceptual speaker ID offers meaningful insights for further development of automated systems.

29 citations

Journal ArticleDOI
TL;DR: A set of model-based approaches for unsupervised spoken term detection (STD) with spoken queries that requires neither speech recognition nor annotated data and the usefulness of ASMs for STD in zero-resource settings and the potential of an instantly responding STD system using ASM indexing are demonstrated.
Abstract: We present a set of model-based approaches for unsupervised spoken term detection (STD) with spoken queries that requires neither speech recognition nor annotated data. This work shows the possibilities in migrating from DTW-based to model-based approaches for unsupervised STD. The proposed approach consists of three components: self-organizing models, query matching, and query modeling. To construct the self-organizing models, repeated patterns are captured and modeled using acoustic segment models (ASMs). In the query matching phase, a document state matching (DSM) approach is proposed to represent documents as ASM sequences, which are matched to the query frames. In this way, not only do the ASMs better model the signal distributions and time trajectories of speech, but the much-smaller number of states than frames for the documents leads to a much lower computational load. A novel duration-constrained Viterbi (DC-Vite) algorithm is further proposed for the above matching process to handle the speaking rate distortion problem. In the query modeling phase, a pseudo likelihood ratio (PLR) approach is proposed in the pseudo relevance feedback (PRF) framework. A likelihood ratio evaluated with query/anti-query HMMs trained with pseudo relevant/irrelevant examples is used to verify the detected spoken term hypotheses. The proposed framework demonstrates the usefulness of ASMs for STD in zero-resource settings and the potential of an instantly responding STD system using ASM indexing. The best performance is achieved by integrating DTW-based approaches into the rescoring steps in the proposed framework. Experimental results show an absolute 14.2% of mean average precision improvement with 77% CPU time reduction compared with the segmental DTW approach on a Mandarin broadcast news corpus. Consistent improvements were found on TIMIT and MediaEval 2011 Spoken Web Search corpus.

29 citations

Proceedings Article
Dong Yu1, Li Deng1, Alex Acero1
01 Sep 2009
TL;DR: It is demonstrated that a 20.8% classification error rate can be achieved on the TIMIT phone classification task using the HCRF-DC model, which is superior to any published single-system result on this heavily evaluated task.
Abstract: We advance the recently proposed hidden conditional random field (HCRF) model by replacing the moment constraints (MCs) with the distribution constraints (DCs). We point out that the distribution constraints are the same as the traditional moment constraints for the binary features but are able to better regularize the probability distribution of the continuousvalued features than the moment constraints. We show that under the distribution constraints the HCRF model is no longer log-linear but embeds the model parameters in non-linear functions. We provide an effective solution to the resulting more difficult optimization problem by converting it to the traditional log-linear form at a higher-dimensional space of features exploiting cubic spline. We demonstrate that a 20.8% classification error rate (CER) can be achieved on the TIMIT phone classification task using the HCRF-DC model. This result is superior to any published single-system result on this heavily evaluated task including the HCRF-MC model, the discriminatively trained HMMs, and the large-margin HMMs using the same features. Index Terms: hidden conditional random field, maximum entropy, moment constraint, distribution constraint, phone classification, TIMIT, cubic spline

29 citations

Posted Content
TL;DR: This paper proposed vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task, which uses either a gumbel softmax or online k-means clustering to quantize the dense representations.
Abstract: We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.

29 citations

Journal ArticleDOI
TL;DR: This paper proposes two convex cost functions for learning the weighting functions and an adaptation strategy to customize the approach to a particular speaker using minimal training, and shows that the proposed methods outperform three competing methods on both healthy and dysarthric speech.
Abstract: Speaking rate estimation directly from the speech waveform is a long-standing problem in speech signal processing. In this paper, we pose the speaking rate estimation problem as that of estimating a temporal density function whose integral over a given interval yields the speaking rate within that interval. In contrast to many existing methods, we avoid the more difficult task of detecting individual phonemes within the speech signal and we avoid heuristics such as thresholding the temporal envelope to estimate the number of vowels. Rather, the proposed method aims to learn an optimal weighting function that can be directly applied to time-frequency features in a speech signal to yield a temporal density function. We propose two convex cost functions for learning the weighting functions and an adaptation strategy to customize the approach to a particular speaker using minimal training. The algorithms are evaluated on the TIMIT corpus, on a dysarthric speech corpus, and on the ICSI Switchboard spontaneous speech corpus. Results show that the proposed methods outperform three competing methods on both healthy and dysarthric speech. In addition, for spontaneous speech rate estimation, the result show a high correlation between the estimated speaking rate and ground truth values.

29 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895