Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Identifying non-linguistic speech features.

[...]

Lori Lamel, Jean-Luc Gauvain

01 Jan 1993

TL;DR: A uniﬁed approach to iden-tifying non-linguistic speech features from the recordedsignal using phone-based acoustic likelihoods, which has been shown to be effective for text-independent,vocabulary-independent sex, speaker, and language identi-ﬀcation and promising for a variety of applications.

...read moreread less

Abstract: SUMMARY In this paper we have presented a uniﬁed approach forthe identiﬁcation of non-linguistic speech features fromrecorded signals using phone-based acoustic likelihoods.The inclusion of this technique in speech-based systems,can broaden the scope of applications of speech technolo-gies, and lead to more user-friendly systems. The approachis based on training a set of large phone-based ergodicHMMs for each non-linguisticfeature to be identiﬁed (lan-guage, gender, speaker, ...), and identifying the feature asthat associated with the model having the highest acousticlikelihoodof the set. The decoding procedure is efﬁcientlyimplemented by processing all the models in parallel usinga time-synchronous beam search strategy.This has been shown to be a powerful technique for sex,language, and speaker-identiﬁcation, and has other possi-ble applications such as for dialect identiﬁcation (includ -ing foreign accents), or identiﬁcation of speech disﬂuen-cies. Sex-identiﬁcation for BREF and WSJ was error-free,and 99% accurate for TIMIT with 2s of speech. Speakeridentiﬁcation accuracies of 98.8% on TIMIT (168 speak-ers) and 99.1% on BREF (65 speakers) were obtained withone utterance per speaker, and 100% if 2 utterances wereused foridentiﬁcation. This identiﬁcationaccuracy was ob -tained on the 168 test speakers of TIMIT without makinguse of the phonetic transcriptionsduring training,verifyingthat it is not necessary to have labeled data adaptation data.Speaker-independent models can be used to provide the la-bels used in building the speaker-speciﬁc models. Beingindependent of the spoken text, and requiring only a smallamount of identiﬁcation speech (on the order of 2.5s), thistechnique is promising for a variety of applications, partic-ularly those for which continual, transparent veriﬁcation ispreferable.Tests of two-way language identiﬁcation of read, labora-toryspeech show that with 2sof speech the languageis cor-rectly identiﬁed as English or French with over 99% accu-racy. Simply portingthe approach to the conditionsof tele-phone speech, French and English data in the OGI multi-language telephone speech corpus was about 76% with 2sof speech, and increased to 82% with 10s. The overall 10-languageidentiﬁcationaccuracy on thedesignateddevelop -ment test data of in the OGI corpus is 59.7%. These resultswere obtained without the use of phone transcriptions fortraining, which were used for the experiments with labora-tory speech.In conclusion, we propose a uniﬁed approach to iden-tifying non-linguistic speech features from the recordedsignal using phone-based acoustic likelihoods. This tech-nique has been shown to be effective for text-independent,vocabulary-independent sex, speaker, and language identi-ﬁcation. While phone labels have been used to train thespeaker-independent seed models, these models can thenbe used to label unknown speech, thus avoiding the costlyprocess of transcribing the speech data. The ability to ac-curately identify non-linguisticspeech features can leadtomore performant spoken language systems enabling betterand more friendly human machine interaction.

...read moreread less

39 citations

Proceedings Article•DOI•

Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm

[...]

Hesham Tolba¹, Sid-Ahmed Selouani¹, Douglas O'Shaughnessy¹•Institutions (1)

Université du Québec¹

13 May 2002

TL;DR: It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main peaks of the spectrum of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance.

...read moreread less

Abstract: In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems Our goal in this paper is to improve the performance of the HMM-based ASR systems by exploiting some features that characterize speech sounds based on the auditory system and one based on the Fourier power spectrum It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main peaks of the spectrum of a speech signal using a multi-stream paradigm leads to an improvement in the recognition performance The Hidden Markov Model Toolkit (HTK) was used throughout our experiments to test the use of the new multi-stream feature vector A series of experiments on speaker-independent continuous-speech recognition have been carried out using a subset of the large read-speech corpus TIMIT Using such multi-stream paradigm, N-mixture mono-/tri-phone models and a bigram language model, we found that the word error rate was decreased by about 401%

...read moreread less

39 citations

Proceedings Article•DOI•

PAC-Bayesian approach for minimization of phoneme error rate

[...]

Joseph Keshet¹, David McAllester¹, Tamir Hazan¹•Institutions (1)

Toyota Technological Institute at Chicago¹

22 May 2011

TL;DR: A new approach for phoneme recognition which aims at minimizing the phoneme error rate is described, which is derived by finding the gradient of the PAC-Bayesian bound and minimizing it by stochastic gradient descent.

...read moreread less

Abstract: We describe a new approach for phoneme recognition which aims at minimizing the phoneme error rate. Building on structured prediction techniques, we formulate the phoneme recognizer as a linear combination of feature functions. We state a PAC-Bayesian generalization bound, which gives an upper-bound on the expected phoneme error rate in terms of the empirical phoneme error rate. Our algorithm is derived by finding the gradient of the PAC-Bayesian bound and minimizing it by stochastic gradient descent. The resulting algorithm is iterative and easy to implement. Experiments on the TIMIT corpus show that our method achieves the lowest phoneme error rate compared to other discriminative and generative models with the same expressive power.

...read moreread less

39 citations

Proceedings Article•DOI•

Phoneme classification using naive Bayes classifier in reconstructed phase space

[...]

Jinjin Ye¹, Richard J. Povinelli¹, Michael T. Johnson¹•Institutions (1)

Marquette University¹

13 Oct 2002

TL;DR: The results show that a reconstructed phase space approach is a viable method for classification of phonemes, with the potential for use in a continuous speech recognition system.

...read moreread less

Abstract: A novel method for classifying speech phonemes is presented. Unlike traditional cepstral based methods, this approach uses histograms of reconstructed phase spaces. A naive Bayes classifier uses the probability mass estimates for classification. The approach is verified using isolated fricative, vowel, and nasal phonemes from the TIMIT corpus. The results show that a reconstructed phase space approach is a viable method for classification of phonemes, with the potential for use in a continuous speech recognition system.

...read moreread less

39 citations

Proceedings Article•

Segmentation and modeling in segment-based recognition.

[...]

Jane W. Chang¹, James Glass•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1997

TL;DR: The acoustic segmentation algorithm is replaced with “segmentation by recognition,” a probabilistic algorithm that can combine multiple contextual constraints towards hypothesizing only the most likely segments and an efficient search algorithm is described that can efficiently use multiple models to enforce contextual constraints across all segments in a network.

...read moreread less

Abstract: Recently, we have developed a probabilistic framework for segmentbased speech recognition that represents the speech signal as a network of segments and associated feature vectors [2]. Although in general, each path through the network does not traverse all segments, we argued that each path must account for all feature vectors in the network. We then demonstrated an efficient search algorithm that uses a single additional model to account for segments that are not traversed. In this paper, we present two new extensions to our framework. First, we replace our acoustic segmentation algorithm with “segmentation by recognition,” a probabilistic algorithm that can combine multiple contextual constraints towards hypothesizing only the most likely segments. Second, we generalize our framework to “near-miss modeling” and describe a search algorithm that can efficiently use multiple models to enforce contextual constraints across all segments in a network. We report experiments in phonetic recognition on the TIMIT corpus in which we achieve a diphone context-dependent error rate of 26.6% on the NIST core test set over 39 classes. This is a 12.8% reduction in error rate from our best previously reported result.

...read moreread less

38 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics