Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy

[...]

Boneung Koo

01 Jan 2014-The Journal of the Acoustical Society of Korea

TL;DR: In this article, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients and a threshold value is obtained based on means and standard deviations of nonspeech frames.

...read moreread less

Abstract: In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD`s for SNR values of ranging from 10 to -10 dB.

...read moreread less

1 citations

Journal Article•DOI•

Refining Sparse Coding Sub-word Unit Inventories with Lattice-constrained Viterbi Training☆

[...]

Wiehan Agenbag¹, Thomas Niesler¹•Institutions (1)

Stellenbosch University¹

01 Jan 2016-Procedia Computer Science

TL;DR: This investigation of two novel lattice-constrained Viterbi training strategies for improving sub-word unit (SWU) inventories that were discovered using an unsupervised sparse coding approach finds that this lightly supervised approach substantially increases correspondence with the reference phonemes, and in this case also improves pronunciation consistency.

...read moreread less

1 citations

Dissertation•

Speaker Identification - Features, Models and Robustness

[...]

Mehrdad Ghassemian, Kasper Strange¹•Institutions (1)

Technical University of Denmark¹

01 Jan 2009

TL;DR: In this paper, the authors used the Fisher's F-ratio to measure the frequency regions containing the most discriminative information and suppress the phonetic information in the speech.

...read moreread less

Abstract: This Master's thesis presents an investigation of the features and models used when constructing a robust speaker identification system using the TIMIT speaker database. Investigations of the k-Means clustering algorithm and the Gaussian mixture models (GMM) for speaker modelling show an improvement in the identification rate when using the GMM speaker models. The features for the speaker identification should emphasize the individual differences in the speech while suppressing the phonetic information, the exact opposite is the case for the features used for speech recognition. However the same features, the MFCCs, have been used for both tasks. Using the Fisher's F-ratio to measure the frequency regions containing the most discriminative speaker information we present a new set of features, the FRFCCs. They emphasize the regions with speaker discriminative information and suppress the phonetic information in the speech. The Fisher's F-ratio shows that the regions around the fundamental frequency (100 Hz) and the third (2500 Hz) and fourth (3500 Hz) formant contain large speaker information, while the region around the first formant (500 Hz) contains only phonetic information. By adding noise to the TIMIT database we show that using the FRFCC features yield a better and more robust automatic speaker identification system. Finally testing on speech from Danish TV we show that using the FRFCCs instead of the MFCCs gives an improvement of 91%.

...read moreread less

1 citations

Posted Content•

Acoustic feature learning using cross-domain articulatory measurements

[...]

Qingming Tang¹, Weiran Wang², Karen Livescu¹•Institutions (2)

Toyota Technological Institute at Chicago¹, Amazon.com²

19 Mar 2018-arXiv: Computation and Language

TL;DR: In this article, the authors studied the problem of acoustic feature learning in the setting where they have access to an external, domain mismatched dataset of paired speech and articulatory measurements, either with or without labels.

...read moreread less

Abstract: Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions. One limitation of this prior work is that the learned feature models are difficult to port to new datasets or domains, and articulatory data is not available for most speech corpora. In this work we study the problem of acoustic feature learning in the setting where we have access to an external, domain-mismatched dataset of paired speech and articulatory measurements, either with or without labels. We develop methods for acoustic feature learning in these settings, based on deep variational CCA and extensions that use both source and target domain data and labels. Using this approach, we improve phonetic recognition accuracies on both TIMIT and Wall Street Journal and analyze a number of design choices.

...read moreread less

1 citations

Journal Article•DOI•

Acoustic‐phonetic features for stop consonant place detection in clean and telephone speech

[...]

Jung-Won Lee¹, Jeung-Yoon Choi•Institutions (1)

Yonsei University¹

09 May 2008-Journal of the Acoustical Society of America

TL;DR: It is suggested that cepstral coefficients are able to model speech in a given environment in finer detail, whereas acoustic phonetic‐based features are more robust to changes in environment, so that combining both types of measurements leads to the best performance.

...read moreread less

Abstract: This work classifies voiceless stop consonant place in CV tokens of English using burst release cues for clean (TIMIT) and telephone speech (NTIMIT). We compared the performance of cepstral coefficients to acoustic phonetics‐motivated features such as center of gravity, burst amplitude and relative difference of formant amplitudes. In clean speech, cepstral coefficients resulted in better classification. However, for test data from NTIMIT, acoustic phonetic‐based features outperformed cepstral coefficients, particularly if models were trained on clean speech. In addition, augmenting cepstral coefficients with acoustic phonetic‐based measurements resulted in the best performance. These findings suggest that cepstral coefficients are able to model speech in a given environment in finer detail, whereas acoustic phonetic‐based features are more robust to changes in environment, so that combining both types of measurements leads to the best performance.

...read moreread less

1 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics