scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Book ChapterDOI
18 Sep 2018
TL;DR: In this article, the authors investigated recurrent deep neural networks (DNNs) in combination with regularization techniques such as dropout, zoneout, and regularization post-layer.
Abstract: In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition task. In recent years, recurrent DNNs pushed the error rates in automatic speech recognition down. But, there was no clear winner in proposed architectures. The dropout was used as the regularization technique in most cases, but combination with other regularization techniques together with model ensembles was omitted. However, just an ensemble of recurrent DNNs performed best and achieved an average phone error rate from 10 experiments 14.84% (minimum 14.69%) on core test set that is slightly lower then the best-published PER to date, according to our knowledge. Finally, in contrast of the most papers, we published the open-source scripts to easily replicate the results and to help continue the development.

3 citations

Proceedings ArticleDOI
09 Sep 2012
TL;DR: In this paper, word-level hidden Markov models (HMMs) are proposed to supplement state-of-the-art phone-based acoustic modeling in order to enhance the performance of automatic speech recognition (ASR) system.
Abstract: In this paper, we propose word-level hidden Markov models (HMMs) to supplement state-of-the-art phone-based acoustic modeling in order to enhance the performance of automatic speech recognition (ASR) system. Each word in a vocabulary is initially modeled by well-trained triphone models. Maximum a posteriori adaptation is then applied to generate models for words with a large number of occurrences in the training set so that the acoustic distribution of the words can be modeled more precisely. Experimental results show that the proposed wordbased systems outperform phone-based systems on the TIMIT task with a small training corpus. While in tasks with plenty of training data, word-based systems still show improvements over phone-based systems, such as the WSJ task. Furthermore the word-based systems have a better discriminating ability on short words and homophones. They are also more robust to language model weight variation than conventional phone-based systems.

3 citations

Journal ArticleDOI
TL;DR: A new segmental k-NN-based phoneme recognition technique that takes advantage of a similarity search algorithm called Spatial Approximate Sample Hierarchy (SASH), which allows us to use high-dimensional feature vectors to represent phonemes and evaluate the proposed algorithm with the sole use of the best hypothesis for every segment.

3 citations

Proceedings ArticleDOI
08 Sep 2016
TL;DR: Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.
Abstract: Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

3 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: In this paper, a context-dependent (CD) triphone states were introduced to model co-articulation and pronunciation mismatches arising from an imperfect lexicon, and two speaker normalization methods in the feature space, namely mean \& variance normalization and vocal tract length normalization, were used to enhance recognition rates of GMM-based models.
Abstract: Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean \& variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22\% and a speaker-dependent PER of 20.5\%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.

3 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895