Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Classification and recognition with direct segment models

[...]

Geoffrey Zweig¹•Institutions (1)

Microsoft¹

25 Mar 2012

TL;DR: Initial steps are taken at using segment based direct models on their own, first by developing a segment-based maximum entropy phone classifier, and then by utilizing the features in a segmental conditional random field for recognition.

...read moreread less

Abstract: Segment based direct models have recently been used to improve the output of existing state-of-the-art speech recognizers. To date, however, they have relied on an existing HMM system to provide segment boundaries. This paper takes initial steps at using these models on their own, first by developing a segment-based maximum entropy phone classifier, and then by utilizing the features in a segmental conditional random field for recognition. To produce a feature representation that is independent of segment length, we utilize a set of ngram features based on vector-quantized representations of the acoustic input. We find that the models are able to integrate information at different granularities and from different streams. Contextual information from around the segment boundaries is particularly important. We obtain competitive results for TIMIT phone classification, and present initial recognition results.

...read moreread less

25 citations

Proceedings Article•DOI•

Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification

[...]

Yun-Hsuan Sung¹, C. Boulis¹, Christopher D. Manning¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

01 Dec 2007

TL;DR: The use of regularization effectively prevents overfitting and HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions.

...read moreread less

Abstract: We show a number of improvements in the use of Hidden Conditional Random Fields (HCRFs) for phone classification on the TIMIT and Switchboard corpora. We first show that the use of regularization effectively prevents overfitting, improving over other methods such as early stopping. We then show that HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions. Finally, we successfully apply Maximum a Posteriori adaptation to HCRFs, decreasing the phone classification error rate in the Switchboard corpus by around 1% -5% given only small amounts of adaptation data.

...read moreread less

25 citations

Proceedings Article•DOI•

Kaldi-based DNN Architectures for Speech Recognition in Romanian

[...]

Alexandru-Lucian Georgescu¹, Horia Cucu¹, Corneliu Burileanu¹•Institutions (1)

Politehnica University of Bucharest¹

01 Oct 2019

TL;DR: This is the first paper to describe in-depth these various flavors of TDNN, providing details regarding the speech features used at input, constituent layers and their dimensionality, regularization techniques etc.

...read moreread less

Abstract: Kaldi NNET3 is at the moment the leading speech recognition toolkit on many well-known tasks such as LibriSpeech, TED-LIUM or TIMIT. Several versions of the time-delay neural network (TDNN) architecture were recently proposed, implemented and evaluated for acoustic modeling with Kaldi: plain TDNN, convolutional TDNN (CNN-TDNN), long short-term memory TDNN (TDNN-LSTM) and TDNN-LSTM with attention. To the best of our knowledge, this is the first paper to describe in-depth these various flavors of TDNN, providing details regarding the speech features used at input, constituent layers and their dimensionality, regularization techniques etc. The various acoustic models were evaluated in conjunction with n-gram and recurrent language models in an automatic speech recognition (ASR) experiment for the Romanian language. We report significantly better results over the previous ASR systems for the same Romanian ASR tasks.

...read moreread less

25 citations

Journal Article•DOI•

In-Set/Out-of-Set Speaker Recognition Under Sparse Enrollment

[...]

Vinod Prakash¹, John H. L. Hansen¹•Institutions (1)

University of Texas at Dallas¹

01 Sep 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Distribution scaling based scorenormalization techniques are developed specifically for the in-set/out-of-set problem and compared against existing score normalization schemes used in open-set speaker recognition.

...read moreread less

Abstract: In this paper, the problem of identifying in-set versus out-of-set speakers using extremely limited enrollment data is addressed. The recognition objective is to form a binary decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or not. Here, the emphasis is on low enrollment (about 5 sec of speech for each enrolled speaker) and test data durations (2-8 sec), in a text-independent scenario. In order to overcome the limited enrollment, data from speakers that are acoustically close to a given in-set speaker are used to form an informative prior (base model) for speaker adaptation. Score normalization for in-set systems is addressed, and the difficulty of using conventional score normalization schemes for in-set speaker recognition is highlighted. Distribution scaling based score normalization techniques are developed specifically for the in-set/out-of-set problem and compared against existing score normalization schemes used in open-set speaker recognition. Experiments are performed using the following three separate corpora: (1) noise-free TIMIT; (2) noisy in-vehicle CU-move; and (3) the NIST-SRE-2006 database. Experimental results show a consistent increase in system performance for the proposed techniques.

...read moreread less

25 citations

Proceedings Article•

Learning discriminative temporal patterns in speech: development of novel TRAPS-like classifiers.

[...]

Barry Y. Chen, Shuangyu Chang¹, Sunil Sivadas¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2003

TL;DR: Two novel variants of TRAPS developed to address some shortcomings of the TRAPS classifiers are experimented with and it is found that approximately 20 discriminative temporal patterns per critical band is sufficient for good recognition performance.

...read moreread less

Abstract: Motivated by the temporal processing properties of human hearing, researchers have explored various methods to incorporate temporal and contextual information in ASR systems. One such approach, TempoRAl PatternS (TRAPS), takes temporal processing to the extreme and analyzes the energy pattern over long periods of time (500 ms to 1000 ms) within separate critical bands of speech. In this paper we extend the work on TRAPS by experimenting with two novel variants of TRAPS developed to address some shortcomings of the TRAPS classifiers. Both the Hidden Activation TRAPS (HATS) and Tonotopic MultiLayer Perceptrons (TMLP) require 84% less parameters than TRAPS but can achieve significant phone recognition error reduction when tested on the TIMIT corpus under clean, reverberant, and several noise conditions. In addition, the TMLP performs training in a single stage and does not require critical band level training targets. Using these variants, we find that approximately 20 discriminative temporal patterns per critical band is sufficient for good recognition performance. In combination with a conventional PLP system, these TRAPS variants achieve significant additional performance improvements.

...read moreread less

25 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics