Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Globality-Locality Consistent Discriminant Analysis for Phone Classification

[...]

Heyun Huang¹, Y. Liu¹, Jort F. Gemmeke¹, L.F.M. ten Bosch¹, Bert Cranen¹, Lou Boves² - Show less +2 more•Institutions (2)

Radboud University Nijmegen¹, Hong Kong Polytechnic University²

01 Jan 2011

TL;DR: A novel supervised dimensionality reduction algorithm, called Globality-Locality Consistent Discriminant Analysis (GLCDA), which aims to preserve global and local discriminant information simultaneously and can provide a more faithful compact representation of high-dimensional observations than entirely global approaches or heuristic approaches aimed to preserve local information.

...read moreread less

Abstract: Concatenating sequences of feature vectors helps to capture essential information about articulatory dynamics, at the cost of increasing the number of dimensions in the feature space, which may be characterized by the presence of manifolds. Existing supervised dimensionality reduction methods such as Linear Discriminant Analysis may destroy part of that manifold structure. In this paper, we propose a novel supervised dimensionality reduction algorithm, called Globality-Locality Consistent Discriminant Analysis (GLCDA), which aims to preserve global and local discriminant information simultaneously. Because it allows finding the optimal trade-off between global and local structure of data sets, GLCDA can provide a more faithful compact representation of high-dimensional observations than entirely global approaches or heuristic approaches aimed to preserve local information. Experimental results on the TIMIT phone classification task show the effectiveness of the proposed algorithm.

...read moreread less

6 citations

Journal Article•DOI•

Integration of acoustic and articulatory information with application to speech recognition

[...]

Ka-Yee Leung¹, Man-Hung Siu¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jun 2004-Information Fusion

TL;DR: It is shown that combining the models during training not only improved performance but also simplified fusion process during recognition, particularly for highly constrained recognition fusion such as synchronous models combination.

...read moreread less

6 citations

Proceedings Article•DOI•

Impact of pronunciation variation in speech recognition

[...]

R. Golda Brunet¹, Hema A. Murthy¹•Institutions (1)

Indian Institute of Technology Madras¹

22 Jul 2012

TL;DR: The preliminary experiments carried out for TIMIT corpus reveal that the use of prominent pronunciation variants for each dialect leads to an improved recognition performance.

...read moreread less

Abstract: Mapping the acoustic sequence to lexical units is an issue in speech recognition. To address this, multiple pronunciations are included in the pronunciation dictionary. However, the number of lexical variants required for improved recognition is not clear as pronunciation varies significantly across dialects. This can lead to poor recognition sometimes. In this paper, a systematic study is carried out to observe the effect of pronunciation variation on recognition accuracy. In particular, a data-driven approach is employed to observe pronunciation variation at syllable level. The acoustic cue about the syllable boundaries are obtained from Group Delay (GD) segmentation. The preliminary experiments carried out for TIMIT corpus reveal that the use of prominent pronunciation variants for each dialect leads to an improved recognition performance.

...read moreread less

6 citations

Automatic discovery of subword units and pronunciations for automatic speech recognition using TIMIT

[...]

George Goussard, Thomas Niesler¹•Institutions (1)

Stellenbosch University¹

01 Nov 2010

TL;DR: When vocabulary words are not repeated often in the training set, the best system is able to outperform its counterpart based on the TIMIT phonetic transcriptions, although recognition performance in both cases is poor.

...read moreread less

Abstract: We address the automatic generation of acoustic subword units and an associated pronunciation dictionary for speech recognition. The speech audio is first segmented into phoneme-like units by detecting points at which the spectral characteristics of the signal change abruptly. These audio segments are subsequently subjected to agglomerative clustering in order to group similar acoustic segments. Finally, the orthography is iteratively aligned with the resulting transcription in terms of audio clusters in order to determine pronunciations of the training words. The approach is evaluated by applying it to two subsets of the TIMIT corpus, both of which have a closed vocabulary. It is found that, when vocabulary words occur often in the training set, the proposed technique delivers performance that is close to but lower than a system based on the TIMIT phonetic transcriptions. When vocabulary words are not repeated often in the training set, the best system is able to outperform its counterpart based on the TIMIT phonetic transcriptions, although recognition performance in both cases is poor.

...read moreread less

6 citations

Proceedings Article•DOI•

A Voice Activity Detection Model Composed of Bidirectional LSTM and Attention Mechanism

[...]

Yeonguk Yu¹, Yoon-Joong Kim¹•Institutions (1)

Hanbat National University¹

01 Nov 2018

TL;DR: A deep learning model that consists of the bidirectional Long-Short Term Memory (bi-LSTM) and the attention mechanism to perform frame-wise Voice Activity Detection (VAD) outperforms the conventional VAD with LSTM and it is shown how the attention mechanisms can help VAD tasks by visualizing the attention distribution of the model.

...read moreread less

Abstract: In this study, we proposed a deep learning model that consists of the bidirectional Long-Short Term Memory (bi-LSTM) and the attention mechanism to perform frame-wise Voice Activity Detection (VAD). The bi-LSTM extracts annotations of frame by summarizing information from both direction. The attention mechanism accepts the annotations to extracts such frames that are important to the voice activity judgement and aggregates the representation of those informative frames to form an attention distribution vector. It is used as features for frame classification by logistic classification approach. We constructed four comparative models to perform experiments with TIMIT corpus and noise signals. The excrement shows that the proposed model outperforms the conventional VAD with LSTM. And we showed how the attention mechanism can help VAD tasks by visualizing the attention distribution of the model.

...read moreread less

6 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics