scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
01 Feb 2013
TL;DR: The low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 %" for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 %, and the overall speedup up to 3.25 for DTW.
Abstract: Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately ${1} \over {3}$ of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.

2 citations

Proceedings ArticleDOI
15 Feb 2020
TL;DR: This paper builds a phoneme recognition system based on Listen, Attend and Spell model and uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme embedding vectors.
Abstract: In this paper, we present how to hybridize a Word2vec model and an attention-based end-to-end speech recognition model. We build a phoneme recognition system based on Listen, Attend and Spell model. And the phoneme recognition model uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme embedding vectors. At the same time, in order to solve the problem of overfitting in the 61 phoneme recognition model on TIMIT dataset, we propose a new training method. A 61-39 phoneme mapping comparison table is used to inverse map the phonemes of the dataset to generate more 61 phoneme training data. At the end of training, replace the dataset with a standard dataset for corrective training. Our model can achieve the best result under the TIMIT dataset which is 16.5% PER (Phoneme Error Rate).

2 citations

Proceedings ArticleDOI
21 Mar 2018
TL;DR: A systematic approach of keywords spotting (KWS) in continuous speech using an hybrid model based LSTM network in combination with Hidden Markov Model (HMM) built with the open source speech recognition toolkit Kaldi.
Abstract: Recently, the Long Short Term Memory (LSTM) architecture has been shown outperforming other state-of-the-art approaches, such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN), in performances of many speech recognition tasks. The LSTM network aims to further improve the modeling of long-range temporal dynamics and to remedy the vanishing and exploding gradient problems of conventional reccurent neural network (RNN). Motivated by the tremendous success of the LSTM, we present in this paper a systematic approach of keywords spotting (KWS) in continuous speech. This system performs on two stages, in first one the continuous speech is decoded into phonetic flow using an hybrid model based LSTM network in combination with Hidden Markov Model (HMM) built with the open source speech recognition toolkit Kaldi, and in the second stage the keywords will be identified and detected from this phones sequence using the Classification and Regression Tree (CART) implemented with the software MATLAB. The work and experiments are conducted on the TIMIT data set.

2 citations

Proceedings ArticleDOI
01 Jun 2000
TL;DR: The motivation is that clustering at the finer acoustic level of subspace Gaussians of lower dimension is more effective, resulting in lower distortions and relatively fewer regression classes.
Abstract: In the hidden Markov modeling framework with mixture Gaussians, adaptation is often done by modifying the Gaussian mean vectors using MAP estimation or MLLR transformation. When the amount of adaptation data is scarce or when some speech units are unseen in the data, it is necessary to do adaptation in groups-either with regression classes of Gaussians or via vector field smoothing. In this paper, we propose to derive regression classes of subspace Gaussians for MAP adaptation. The motivation is that clustering at the finer acoustic level of subspace Gaussians of lower dimension is more effective, resulting in lower distortions and relatively fewer regression classes. Experiments in which context-dependent TIMIT HMMs are adapted to the resource management task with few minutes of speech show improvement of our subspace regression classes over traditional full-space regression classes.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895