scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: Using an HMM-based speech segmentation algorithm, speech enhancement and parameter compensation techniques previously proposed for robust speech recognition are evaluated and compared for improved segmentation in colored noise.

55 citations

Posted Content
TL;DR: This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model, and reports that QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.
Abstract: Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.

54 citations

Journal ArticleDOI
TL;DR: This work proposes four additional new cepstral features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise and proposes an alternative way of doing adaptive component weighting called the ACW2 cepstrum.
Abstract: A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. A different cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was previously introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt.

53 citations

Proceedings ArticleDOI
25 Mar 2012
TL;DR: A fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs) and the K nearest neighbor DTW search are presented.
Abstract: In this paper we present a fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs). The lower-bound estimate and the K nearest neighbor DTW search are carefully designed to fit the GPU parallel computing architecture. In a spoken term detection task on the TIMIT corpus, a 55x speed-up is achieved compared to our previous implementation on a CPU without affecting detection performance. On large, artificially created corpora, measurements show that the total computation time of the entire spoken term detection system grows linearly with corpus size. On average, searching a keyword on a single desktop computer with modern GPUs requires 2.4 seconds/corpus hour.

53 citations

Proceedings ArticleDOI
Li Deng1, Dong Yu1
15 Apr 2007
TL;DR: The earlier version of the hidden trajectory model (HTM) for speech dynamics which predicts the "static" cepstra as the observed acoustic feature is generalized to one which predicts joint Static/delta-cepstra HTM, enabling efficient computation of the joint likelihood for both static and delta cepstral sequences as the acoustic features given the model.
Abstract: The earlier version of the hidden trajectory model (HTM) for speech dynamics which predicts the "static" cepstra as the observed acoustic feature is generalized to one which predicts joint static cepstra and their temporal differentials (i.e., delta cepstra). The formulation of this generalized HTM is presented in the generative-modeling framework, enabling efficient computation of the joint likelihood for both static and delta cepstral sequences as the acoustic features given the model. The parameter estimation techniques for the new model are developed and presented, giving closed-form estimation formulas after the use of vector Taylor series approximation. We show principled generalization from the earlier static-cepstra HTM to the new static/delta-cepstra HTM not only in terms of model formulations but also in terms of their respective analytical forms in (monophone) parameter estimation. Experimental results on the standard TIMIT phonetic recognition task demonstrate recognition accuracy improvement over the earlier best HTM system, both significantly better than state-of-the-art triphone HMM systems.

53 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895