Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Recurrent DNNs and Its Ensembles on the TIMIT Phone Recognition Task

[...]

Jan Vanek¹, Josef Michalek¹, Josef Psutka¹•Institutions (1)

University of West Bohemia¹

18 Sep 2018

TL;DR: In this article, the authors investigated recurrent deep neural networks (DNNs) in combination with regularization techniques such as dropout, zoneout, and regularization post-layer.

...read moreread less

Abstract: In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition task. In recent years, recurrent DNNs pushed the error rates in automatic speech recognition down. But, there was no clear winner in proposed architectures. The dropout was used as the regularization technique in most cases, but combination with other regularization techniques together with model ensembles was omitted. However, just an ensemble of recurrent DNNs performed best and achieved an average phone error rate from 10 experiments 14.84% (minimum 14.69%) on core test set that is slightly lower then the best-published PER to date, according to our knowledge. Finally, in contrast of the most papers, we published the open-source scripts to easily replicate the results and to help continue the development.

...read moreread less

3 citations

Proceedings Article•DOI•

A Study on Using Word-Level HMMs to Improve ASR Performance over State-of-the-Art Phone-Level Acoustic Modeling for LVCSR.

[...]

I-Fan Chen¹, Chin-Hui Lee¹•Institutions (1)

Georgia Institute of Technology¹

09 Sep 2012

TL;DR: In this paper, word-level hidden Markov models (HMMs) are proposed to supplement state-of-the-art phone-based acoustic modeling in order to enhance the performance of automatic speech recognition (ASR) system.

...read moreread less

Abstract: In this paper, we propose word-level hidden Markov models (HMMs) to supplement state-of-the-art phone-based acoustic modeling in order to enhance the performance of automatic speech recognition (ASR) system. Each word in a vocabulary is initially modeled by well-trained triphone models. Maximum a posteriori adaptation is then applied to generate models for words with a large number of occurrences in the training set so that the acoustic distribution of the words can be modeled more precisely. Experimental results show that the proposed wordbased systems outperform phone-based systems on the TIMIT task with a small training corpus. While in tasks with plenty of training data, word-based systems still show improvements over phone-based systems, such as the WSJ task. Furthermore the word-based systems have a better discriminating ability on short words and homophones. They are also more robust to language model weight variation than conventional phone-based systems.

...read moreread less

3 citations

Journal Article•DOI•

A segmental non-parametric-based phoneme recognition approach at the acoustical level

[...]

Ladan Golipour¹, Douglas O'Shaughnessy¹•Institutions (1)

Institut national de la recherche scientifique¹

01 Aug 2012-Computer Speech & Language

TL;DR: A new segmental k-NN-based phoneme recognition technique that takes advantage of a similarity search algorithm called Spatial Approximate Sample Hierarchy (SASH), which allows us to use high-dimensional feature vectors to represent phonemes and evaluate the proposed algorithm with the sole use of the best hypothesis for every segment.

...read moreread less

3 citations

Proceedings Article•DOI•

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks.

[...]

Naoya Takahashi¹, Tofigh Naghibi², Beat Pfister²•Institutions (2)

Sony Broadcast & Professional Research Laboratories¹, ETH Zurich²

08 Sep 2016

TL;DR: Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

...read moreread less

Abstract: Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

...read moreread less

3 citations

Proceedings Article•DOI•

Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition.

[...]

Fabian Triefenbach¹, Azarakhsh Jalalvand¹, Kris Demuynck², Jean-Pierre Martens¹•Institutions (2)

Ghent University¹, Katholieke Universiteit Leuven²

25 Aug 2013

TL;DR: In this paper, a context-dependent (CD) triphone states were introduced to model co-articulation and pronunciation mismatches arising from an imperfect lexicon, and two speaker normalization methods in the feature space, namely mean \& variance normalization and vocal tract length normalization, were used to enhance recognition rates of GMM-based models.

...read moreread less

Abstract: Reservoir Computing (RC) has recently been introduced as an interesting alternative for acoustic modeling. For phone and continuous digit recognition, the reservoir approach obtained quite promising results. In this work, we further elaborate this concept by porting some well-known techniques used to enhance recognition rates of GMM-based models to Reservoir Computing. In particular, we introduce context-dependent (CD) triphone states to model co-articulation and pronunciation mismatches arising from an imperfect lexicon. We also propose to incorporate two speaker normalization methods in the feature space, namely mean \& variance normalization and vocal tract length normalization. The impact of the investigated techniques is studied in the context of phone recognition on the TIMIT corpus. Our CD-RC-HMM hybrid yields a speaker-independent phone error rate (PER) of 22\% and a speaker-dependent PER of 20.5\%. By combining GMM and RC-based likelihoods at the state level, these scores can be reduced further.

...read moreread less

3 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics