Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Some acoustic cues for categorizing American English regional dialects

[...]

Cynthia G. Clopper, David B. Pisoni

05 Jun 2001-Journal of the Acoustical Society of America

TL;DR: This paper used the TIMIT corpus of spoken sentences produced by talkers from a number of distinct dialect regions in the United States, and found that several phonetic features distinguish between the dialects.

...read moreread less

Abstract: The perception of phonological differences between regional dialects of American English by naive listeners is poorly understood. Using the TIMIT corpus of spoken sentences produced by talkers from a number of distinct dialect regions in the United States, an acoustic analysis conducted in Experiment I confirmed that several phonetic features distinguish between the dialects. In Experiment II recordings of the sentences were played back to naive listeners who were asked to categorize each talker into one of six geographical dialect regions. Results suggested that listeners are able to reliably categorize talkers into three broad dialect clusters, but have more difficulty accurately categorizing talkers into six smaller regions. Correlations between the acoustic measures and both actual dialect affiliation of the talkers and dialect categorization of the talkers by the listeners revealed that the listeners in this study were sensitive to acoustic‐phonetic features of the dialects in categorizing the talker...

...read moreread less

5 citations

Proceedings Article•DOI•

A robust speech rate estimation based on the activation profile from the selected acoustic unit dictionary

[...]

Supriya Nagesh¹, Chiranjeevi Yarra², Om D. Deshmukh³, Prasanta Kumar Ghosh²•Institutions (3)

National Institute of Technology, Karnataka¹, Indian Institute of Science², Xerox³

20 Mar 2016

TL;DR: This work found that the peaks detected from the data-driven approach significantly improve the speech rate estimation when combined with the traditional TCSSBC approach using a proposed peak-merging strategy.

...read moreread less

Abstract: A typical solution for the speech rate estimation consists of two stages, which involves first computing a short-time feature contour such that most of peaks of the contour correspond to the syllable nuclei followed by the detection of the peaks of the contour corresponding to the syllable nuclei. Temporal correlation selected subband correlation (TCSSBC) is often used as a feature contour for the speech rate estimation in which correlation within and across a few selected sub-band energies are computed. In this work, instead of a fixed set of sub-bands, we learn them in a data-driven manner using a dictionary learning approach. Similarly, instead of the energy contours, we use the activation profile from the learned dictionary elements. We found that the peaks detected from the data-driven approach significantly improve the speech rate estimation when combined with the traditional TCSSBC approach using a proposed peak-merging strategy. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora. Except Switchboard, the correlation coefficient for the speech rate estimation using the proposed approach is found to be higher than those by the TCSSBC technique − 3.1% and 5.2% (relative) improvements for TIMIT and CTIMIT respectively.

...read moreread less

5 citations

Speaker change detection using BIC: a comparison on two datasets

[...]

M. Kotti¹, Emmanouil Benetos¹, Constantine Kotropoulos¹, Luis Gustavo Martins•Institutions (1)

Aristotle University of Thessaloniki¹

31 Mar 2006

TL;DR: This paper addresses the problem of unsupervised speaker change detection by using the Bayesian Information Criterion and a metric-based approach employing line spectral pairs (LSP) and the BIC criterion to validate a potential speaker change point.

...read moreread less

Abstract: This paper addresses the problem of unsupervised speaker change detection We assume that there is no prior knowledge of the number of speakers or their identities Two methods are tested The first method uses the Bayesian Information Criterion (BIC), investigates the AudioSpectrumCentroid and AudioWaveformEnvelope features, and implements a dynamic thresholding followed by a fusion scheme The second method is a real-time one that uses a metric-based approach employing line spectral pairs (LSP) and the BIC criterion to validate a potential speaker change point The methods are tested on two different datasets The first set was created by concatenating speakers from the TIMIT database and is referred to as the TIMIT data set The second set was created by using recordings from the MPEG-7 test set CD1 and broadcast news and is referred to as the INESC dataset

...read moreread less

5 citations

Posted Content•

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

[...]

Heng-Jui Chang¹, Alexander H. Liu¹, Hung-yi Lee¹, Lin-Shan Lee¹•Institutions (1)

National Taiwan University¹

05 May 2020-arXiv: Computation and Language

TL;DR: The results indicate as long as a good E2E model pre-trained on normal or pseudo-whispered speech, a relatively small set of whispered speech may suffice to obtain a reasonably good end-to-end whispered speech recognizer.

...read moreread less

Abstract: Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data. In this paper, we present several approaches for end-to-end (E2E) recognition of whispered speech considering the special characteristics of whispered speech and the scarcity of data. This includes a frequency-weighted SpecAugment policy and a frequency-divided CNN feature extractor for better capturing the high-frequency structures of whispered speech, and a layer-wise transfer learning approach to pre-train a model with normal or normal-to-whispered converted speech then fine-tune it with whispered speech to bridge the gap between whispered and normal speech. We achieve an overall relative reduction of 19.8% in PER and 44.4% in CER on a relatively small whispered TIMIT corpus. The results indicate as long as we have a good E2E model pre-trained on normal or pseudo-whispered speech, a relatively small set of whispered speech may suffice to obtain a reasonably good E2E whispered speech recognizer.

...read moreread less

5 citations

Proceedings Article•DOI•

A study on landmark detection based on CTC and its application to pronunciation error detection

[...]

Chuanying Niu¹, Jinsong Zhang¹, Xuesong Yang², Yanlu Xie¹•Institutions (2)

Beijing Language and Culture University¹, University of Illinois at Urbana–Champaign²

01 Dec 2017

TL;DR: An acoustic model to predict phone labels based on a recurrent neural network (RNN) with bidirectional long short- term memory (BLSTM) units, which is trained by CTC technique and found that the positions of spiky phone outputs of this model are consistent with the landmarks annotated in the TIMIT corpus.

...read moreread less

Abstract: Acoustic features extracted in the vicinity of landmarks have demonstrated their usefulness for detecting mispronunciation in our recent work [1, 2]. Traditional approaches of detecting acoustic landmarks rely on annotations by linguists with prior knowledge of speech production mechanisms, which are laborious and expensive. This paper proposes a data-driven approach of connectionist temporal classification (CTC) that can detect landmarks without any human labels while still maintaining consistent performance with knowledge-based models for stop burst landmarks. We designed an acoustic model to predict phone labels based on a recurrent neural network (RNN) with bidirectional long short- term memory (BLSTM) units, which is trained by CTC technique. We found that the positions of spiky phone outputs of this model are consistent with the landmarks annotated in the TIMIT corpus. Both data-driven and knowledge-based landmark models are applied to detect pronunciation errors of second-language (L2) Chinese learners. Experiments illustrate that data-driven CTC landmark model is comparable to knowledge-based model in pronunciation error detection. The fusion of them can further improve performance.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics