Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Unsupervised Speech Recognition

[...]

Alexei Baevski¹, Wei-Ning Hsu¹, Alexis Conneau¹, Michael Auli¹•Institutions (1)

Facebook¹

24 May 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors leverage self-supervised speech representations to segment unlabeled audio and learn a mapping from these representations to phonemes via adversarial training, achieving state-of-the-art performance on the TIMIT benchmark.

...read moreread less

Abstract: Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe. This paper describes wav2vec-U, short for wav2vec Unsupervised, a method to train speech recognition models without any labeled data. We leverage self-supervised speech representations to segment unlabeled audio and learn a mapping from these representations to phonemes via adversarial training. The right representations are key to the success of our method. Compared to the best previous unsupervised work, wav2vec-U reduces the phoneme error rate on the TIMIT benchmark from 26.1 to 11.3. On the larger English Librispeech benchmark, wav2vec-U achieves a word error rate of 5.9 on test-other, rivaling some of the best published systems trained on 960 hours of labeled data from only two years ago. We also experiment on nine other languages, including low-resource languages such as Kyrgyz, Swahili and Tatar.

...read moreread less

25 citations

Journal Article•DOI•

Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals

[...]

Shabnam Gholamdokht Firooz¹, Farshad Almasganj¹, Yasser Shekofteh¹•Institutions (1)

Amirkabir University of Technology¹

01 Feb 2017-Computers & Electrical Engineering

TL;DR: An effective algorithm is proposed for automatic speech recognition task using speech trajectories reconstructed in the phase space, and some useful features from the Recurrence Plot of the embedded speech signals in the RPS are evaluated via applying a two-dimensional wavelet transform to the resulted RP diagrams.

...read moreread less

25 citations

Proceedings Article•

A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property

[...]

Jianshu Chen¹, Li Deng²•Institutions (2)

University of California, Los Angeles¹, Microsoft²

01 Apr 2014

TL;DR: In this article, a primal-dual training method was proposed to formulates the learning of the RNN as a formal optimization problem with an inequality constraint that provides a sufficient condition for the stability of the network dynamics.

...read moreread less

Abstract: We present an architecture of a recurrent neural network (RNN) with a fully-connected deep neural network (DNN) as its feature extractor. The RNN is equipped with both causal temporal prediction and non-causal look-ahead, via auto-regression (AR) and moving-average (MA), respectively. The focus of this paper is a primal-dual training method that formulates the learning of the RNN as a formal optimization problem with an inequality constraint that provides a sufficient condition for the stability of the network dynamics. Experimental results demonstrate the effectiveness of this new method, which achieves 18.86% phone recognition error on the TIMIT benchmark for the core test set. The result approaches the best result of 17.7%, which was obtained by using RNN with long short-term memory (LSTM). The results also show that the proposed primal-dual training method produces lower recognition errors than the popular RNN methods developed earlier based on the carefully tuned threshold parameter that heuristically prevents the gradient from exploding.

...read moreread less

24 citations

Proceedings Article•DOI•

Speaker adaptation OF RNN-BLSTM for speech recognition based on speaker code

[...]

Zhiying Huang¹, Jian Tang¹, Shaofei Xue¹, Li-Rong Dai¹•Institutions (1)

University of Science and Technology of China¹

20 Mar 2016

TL;DR: This paper studies how to conduct effective speaker code based speaker adaptation on RNN-BLSTM and demonstrates that theSpeaker code based adaptation method is also a valid adaptation method for RNN/LSTM.

...read moreread less

Abstract: Recently, recurrent neural network with bidirectional Long Short-Term Memory (RNN-BLSTM) acoustic model has been shown to give great performance on the TIMIT [1] and other speech recognition tasks. Meanwhile, the speaker code based adaptation method has been demonstrated as a valid adaptation method for Deep Neural Network (DNN) acoustic model [2]. However, whether the speaker code based adaptation method is also valid for RNN-BLSTM has not been reported to the best our knowledge. In this paper, we study how to conduct effective speaker code based speaker adaptation on RNN-BLSTM and demonstrate that the speaker code based adaptation method is also a valid adaptation method for RNN-BLSTM. Experimental results on TIMIT have shown that the adaptation of RNN-LSTM can achieve over 10% relative reduction in phone error rate (PER) compared to without adaptation. Then, a set of comparative experiments are implemented to analyze the different contribution of the adaptation on cell input and each gate activation function of the BLSTM. It's found that the adaptation on cell input activation function is more effective than the adaptation on each gate activation function.

...read moreread less

24 citations

Proceedings Article•

Discriminative Kernel-Based Phoneme Sequence Recognition

[...]

Joseph Keshet¹, Samy Bengio², Dan Chazan, Shai Shalev-Shwartz, Yoram Singer - Show less +1 more•Institutions (2)

Hebrew University of Jerusalem¹, Idiap Research Institute²

01 Jan 2006

TL;DR: A new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM is described, which uses a discriminative kernel-based training procedure in which the learning process is tailored to minimizing the Levenshtein distance between the predicted phoneme sequences and the correct sequence.

...read moreread less

Abstract: We describe a new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM. In contrast to HMM-based approaches, our method uses a discriminative kernel-based training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vector-space endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.

...read moreread less

24 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics