Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
01 Feb 2013TL;DR: The low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 %" for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 %, and the overall speedup up to 3.25 for DTW.
Abstract: Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately ${1} \over {3}$ of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.
2 citations
••
15 Feb 2020TL;DR: This paper builds a phoneme recognition system based on Listen, Attend and Spell model and uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme embedding vectors.
Abstract: In this paper, we present how to hybridize a Word2vec model and an attention-based end-to-end speech recognition model. We build a phoneme recognition system based on Listen, Attend and Spell model. And the phoneme recognition model uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme embedding vectors. At the same time, in order to solve the problem of overfitting in the 61 phoneme recognition model on TIMIT dataset, we propose a new training method. A 61-39 phoneme mapping comparison table is used to inverse map the phonemes of the dataset to generate more 61 phoneme training data. At the end of training, replace the dataset with a standard dataset for corrective training. Our model can achieve the best result under the TIMIT dataset which is 16.5% PER (Phoneme Error Rate).
2 citations
••
21 Mar 2018TL;DR: A systematic approach of keywords spotting (KWS) in continuous speech using an hybrid model based LSTM network in combination with Hidden Markov Model (HMM) built with the open source speech recognition toolkit Kaldi.
Abstract: Recently, the Long Short Term Memory (LSTM) architecture has been shown outperforming other state-of-the-art approaches, such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN), in performances of many speech recognition tasks. The LSTM network aims to further improve the modeling of long-range temporal dynamics and to remedy the vanishing and exploding gradient problems of conventional reccurent neural network (RNN). Motivated by the tremendous success of the LSTM, we present in this paper a systematic approach of keywords spotting (KWS) in continuous speech. This system performs on two stages, in first one the continuous speech is decoded into phonetic flow using an hybrid model based LSTM network in combination with Hidden Markov Model (HMM) built with the open source speech recognition toolkit Kaldi, and in the second stage the keywords will be identified and detected from this phones sequence using the Classification and Regression Tree (CART) implemented with the software MATLAB. The work and experiments are conducted on the TIMIT data set.
2 citations
••
2 citations
••
01 Jun 2000TL;DR: The motivation is that clustering at the finer acoustic level of subspace Gaussians of lower dimension is more effective, resulting in lower distortions and relatively fewer regression classes.
Abstract: In the hidden Markov modeling framework with mixture Gaussians, adaptation is often done by modifying the Gaussian mean vectors using MAP estimation or MLLR transformation. When the amount of adaptation data is scarce or when some speech units are unseen in the data, it is necessary to do adaptation in groups-either with regression classes of Gaussians or via vector field smoothing. In this paper, we propose to derive regression classes of subspace Gaussians for MAP adaptation. The motivation is that clustering at the finer acoustic level of subspace Gaussians of lower dimension is more effective, resulting in lower distortions and relatively fewer regression classes. Experiments in which context-dependent TIMIT HMMs are adapted to the resource management task with few minutes of speech show improvement of our subspace regression classes over traditional full-space regression classes.
2 citations