scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
01 Jul 2017
TL;DR: This research focused on detecting systematic pronunciation errors made by Vietnamese learners of English and used SVM classifiers, which are trained by a native corpuses (TIMIT) and a non-native corpus (V.E Corpus).
Abstract: Pronunciation errors are often made by language learners. Especially, systematic mispronunciations, consisting of substitutions of native sounds for sounds of the target language that do not exist in the native language, are considered a big problem for language leaners. Therefore, automatic detection of this kind of errors is essential to building a Computer-Assisted Language Learning (CALL) system supporting language learners to improve their pronunciation. In this research, we focused on detecting systematic pronunciation errors made by Vietnamese learners of English. To this end, we used SVM classifiers, which are trained by a native corpuses (TIMIT) and a non-native corpus (V.E Corpus). The non-native corpus, constructed by the researchers and annotated by two Vietnamese trained professionals, includes 1550 utterances from 31 Vietnamese students. Each of the students was asked to read 50 English sentences designed to contain English phonemes frequently mispronounced by Vietnamese speakers. The experimental results showed that the detectors can achieve at least 79% SAR and 10% FAR.

1 citations

Proceedings ArticleDOI
20 Oct 2004
TL;DR: Experimental results indicate that the ARMA lattice model achieves an improved noise-resistant capability on vowel phoneme and fricative phonemes as compared to those of the conventional mel-frequency cepstral coefficient (MFCC) method.
Abstract: In this paper, the result of a study on phoneme feature extraction, under a noisy environment, using an auto-regressive moving average (ARMA) lattice model, is presented. The phoneme characteristics are modeled and expressed in the form of ARMA lattice reflection coefficients for classification. Experimental results, based on the TIMIT speech database and NoiseX-92 noise database, indicate that the ARMA lattice model achieves an improved noise-resistant capability on vowel phonemes and fricative phonemes as compared to those of the conventional mel-frequency cepstral coefficient (MFCC) method.

1 citations

Journal Article
TL;DR: A novel statistical approach to corpus-based speech synthesis by defining a probabilistic linguistic Bi-grams model basically used for unit selection, which would be extracted from the English TIMIT corpora.
Abstract: In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora. Keywords—Unit selection, Corpus-based Speech Synthesis, Bigram model

1 citations

Proceedings ArticleDOI
Tong Fu1, Xihong Wu1
01 Jul 2017
TL;DR: A multi-scale method to mitigate the tradeoff and a model architecture that enables to analyze speech at multiple scale is proposed and experimental results show that the proposed model architecture can obtain significant performance improvement.
Abstract: Deep learning has brought a breakthrough to the performance of speech recognition. The speech recognition systems based on deep neural networks have obtained the state-of-the-art performance on various speech recognition tasks. These systems almost utilize the Mel-frequency cepstral coefficients or the Mel-scale log-filterbank coefficients, which are based on short-time Fourier transform. Although these features are designed based on the auditory characteristics of the human, it is a problem that the inherent tradeoff of the temporal and frequency resolution still exists in spectral representations based on short-time Fourier transform. In this paper, we propose a multi-scale method to mitigate the tradeoff and a model architecture that enables to analyze speech at multiple scale. Experiments are conducted on TIMIT and HKUST corpus. We compare the proposed multi-scale features and traditional features at various number of configurations. Experimental results show that the proposed model architecture can obtain significant performance improvement.
Journal ArticleDOI
TL;DR: The results on the TIMIT phone recognition task show that the proposed postprocessor can lead to significant improvements especially when Hidden Markov Models (HMMs) were used as primary acoustic model.
Abstract: In this paper, we present a novel postprocessor for speech recognition using the Augmented Conditional Random Field (ACRF) framework. In this framework, a primary acoustic model is used to generate state posterior scores per frame. These output scores are fed to the ACRF postprocessor for further frame based acoustic modeling. Since ACRF explicitly integrates acoustic context modeling, the postprocessor has the ability to discover new context information and to improve the recognition accuracy. The results on the TIMIT phone recognition task show that the proposed postprocessor can lead to significant improvements especially when Hidden Markov Models (HMMs) were used as primary acoustic model.

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895