Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Fast Likelihood Computation in Speech Recognition using Matrices

[...]

Mrugesh Gajjar¹, T. V. Sreenivas², Ramaswamy Govindarajan²•Institutions (2)

Siemens¹, Indian Institute of Science²

01 Feb 2013

TL;DR: The low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 %" for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 %, and the overall speedup up to 3.25 for DTW.

...read moreread less

Abstract: Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately ${1} \over {3}$ of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.

...read moreread less

2 citations

Proceedings Article•DOI•

Application of Word2vec in Phoneme Recognition

[...]

Xin Feng¹, Lei Wang¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

15 Feb 2020

TL;DR: This paper builds a phoneme recognition system based on Listen, Attend and Spell model and uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme embedding vectors.

...read moreread less

Abstract: In this paper, we present how to hybridize a Word2vec model and an attention-based end-to-end speech recognition model. We build a phoneme recognition system based on Listen, Attend and Spell model. And the phoneme recognition model uses a word2vec model to initialize the embedding matrix for the improvement of the performance, which can increase the distance among the phoneme embedding vectors. At the same time, in order to solve the problem of overfitting in the 61 phoneme recognition model on TIMIT dataset, we propose a new training method. A 61-39 phoneme mapping comparison table is used to inverse map the phonemes of the dataset to generate more 61 phoneme training data. At the end of training, replace the dataset with a standard dataset for corrective training. Our model can achieve the best result under the TIMIT dataset which is 16.5% PER (Phoneme Error Rate).

...read moreread less

2 citations

Proceedings Article•DOI•

Applying long short-term memory concept to hybrid “CD-NN-HMM” model for keywords spotting in continuous speech

[...]

Hinda Dridi¹, Kais Ouni¹•Institutions (1)

Carthage University¹

21 Mar 2018

TL;DR: A systematic approach of keywords spotting (KWS) in continuous speech using an hybrid model based LSTM network in combination with Hidden Markov Model (HMM) built with the open source speech recognition toolkit Kaldi.

...read moreread less

Abstract: Recently, the Long Short Term Memory (LSTM) architecture has been shown outperforming other state-of-the-art approaches, such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN), in performances of many speech recognition tasks. The LSTM network aims to further improve the modeling of long-range temporal dynamics and to remedy the vanishing and exploding gradient problems of conventional reccurent neural network (RNN). Motivated by the tremendous success of the LSTM, we present in this paper a systematic approach of keywords spotting (KWS) in continuous speech. This system performs on two stages, in first one the continuous speech is decoded into phonetic flow using an hybrid model based LSTM network in combination with Hidden Markov Model (HMM) built with the open source speech recognition toolkit Kaldi, and in the second stage the keywords will be identified and detected from this phones sequence using the Classification and Regression Tree (CART) implemented with the software MATLAB. The work and experiments are conducted on the TIMIT data set.

...read moreread less

2 citations

Journal Article•DOI•

ORVAE: One-Class Residual Variational Autoencoder for Voice Activity Detection in Noisy Environment

[...]

Hasam Khalid, Shahroz Tariq, Taesoo Kim, Jong Hwan Ko, Seonghoon Woo - Show less +1 more

03 Jan 2022-Neural Processing Letters

2 citations

Proceedings Article•DOI•

MAP adaptation with subspace regression classes and tying

[...]

Kwok-Man Wong¹, Brian Mak•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jun 2000

TL;DR: The motivation is that clustering at the finer acoustic level of subspace Gaussians of lower dimension is more effective, resulting in lower distortions and relatively fewer regression classes.

...read moreread less

Abstract: In the hidden Markov modeling framework with mixture Gaussians, adaptation is often done by modifying the Gaussian mean vectors using MAP estimation or MLLR transformation. When the amount of adaptation data is scarce or when some speech units are unseen in the data, it is necessary to do adaptation in groups-either with regression classes of Gaussians or via vector field smoothing. In this paper, we propose to derive regression classes of subspace Gaussians for MAP adaptation. The motivation is that clustering at the finer acoustic level of subspace Gaussians of lower dimension is more effective, resulting in lower distortions and relatively fewer regression classes. Experiments in which context-dependent TIMIT HMMs are adapted to the resource management task with few minutes of speech show improvement of our subspace regression classes over traditional full-space regression classes.

...read moreread less

2 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics