Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
•
01 Jan 2011TL;DR: A novel supervised dimensionality reduction algorithm, called Globality-Locality Consistent Discriminant Analysis (GLCDA), which aims to preserve global and local discriminant information simultaneously and can provide a more faithful compact representation of high-dimensional observations than entirely global approaches or heuristic approaches aimed to preserve local information.
Abstract: Concatenating sequences of feature vectors helps to capture essential information about articulatory dynamics, at the cost of increasing the number of dimensions in the feature space, which may be characterized by the presence of manifolds. Existing supervised dimensionality reduction methods such as Linear Discriminant Analysis may destroy part of that manifold structure. In this paper, we propose a novel supervised dimensionality reduction algorithm, called Globality-Locality Consistent Discriminant Analysis (GLCDA), which aims to preserve global and local discriminant information simultaneously. Because it allows finding the optimal trade-off between global and local structure of data sets, GLCDA can provide a more faithful compact representation of high-dimensional observations than entirely global approaches or heuristic approaches aimed to preserve local information. Experimental results on the TIMIT phone classification task show the effectiveness of the proposed algorithm.
6 citations
••
TL;DR: It is shown that combining the models during training not only improved performance but also simplified fusion process during recognition, particularly for highly constrained recognition fusion such as synchronous models combination.
6 citations
••
22 Jul 2012TL;DR: The preliminary experiments carried out for TIMIT corpus reveal that the use of prominent pronunciation variants for each dialect leads to an improved recognition performance.
Abstract: Mapping the acoustic sequence to lexical units is an issue in speech recognition. To address this, multiple pronunciations are included in the pronunciation dictionary. However, the number of lexical variants required for improved recognition is not clear as pronunciation varies significantly across dialects. This can lead to poor recognition sometimes. In this paper, a systematic study is carried out to observe the effect of pronunciation variation on recognition accuracy. In particular, a data-driven approach is employed to observe pronunciation variation at syllable level. The acoustic cue about the syllable boundaries are obtained from Group Delay (GD) segmentation. The preliminary experiments carried out for TIMIT corpus reveal that the use of prominent pronunciation variants for each dialect leads to an improved recognition performance.
6 citations
01 Nov 2010
TL;DR: When vocabulary words are not repeated often in the training set, the best system is able to outperform its counterpart based on the TIMIT phonetic transcriptions, although recognition performance in both cases is poor.
Abstract: We address the automatic generation of acoustic subword units and an associated pronunciation dictionary for speech recognition. The speech audio is first segmented into phoneme-like units by detecting points at which the spectral characteristics of the signal change abruptly. These audio segments are subsequently subjected to agglomerative clustering in order to group similar acoustic segments. Finally, the orthography is iteratively aligned with the resulting transcription in terms of audio clusters in order to determine pronunciations of the training words. The approach is evaluated by applying it to two subsets of the TIMIT corpus, both of which have a closed vocabulary. It is found that, when vocabulary words occur often in the training set, the proposed technique delivers performance that is close to but lower than a system based on the TIMIT phonetic transcriptions. When vocabulary words are not repeated often in the training set, the best system is able to outperform its counterpart based on the TIMIT phonetic transcriptions, although recognition performance in both cases is poor.
6 citations
••
01 Nov 2018TL;DR: A deep learning model that consists of the bidirectional Long-Short Term Memory (bi-LSTM) and the attention mechanism to perform frame-wise Voice Activity Detection (VAD) outperforms the conventional VAD with LSTM and it is shown how the attention mechanisms can help VAD tasks by visualizing the attention distribution of the model.
Abstract: In this study, we proposed a deep learning model that consists of the bidirectional Long-Short Term Memory (bi-LSTM) and the attention mechanism to perform frame-wise Voice Activity Detection (VAD). The bi-LSTM extracts annotations of frame by summarizing information from both direction. The attention mechanism accepts the annotations to extracts such frames that are important to the voice activity judgement and aggregates the representation of those informative frames to form an attention distribution vector. It is used as features for frame classification by logistic classification approach. We constructed four comparative models to perform experiments with TIMIT corpus and noise signals. The excrement shows that the proposed model outperforms the conventional VAD with LSTM. And we showed how the attention mechanism can help VAD tasks by visualizing the attention distribution of the model.
6 citations