Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Towards automatic phonetic segmentation for TTS

[...]

Asaf Rendel¹, Alexander Sorin¹, Ron Hoory¹, Andrew P. Breen²•Institutions (2)

IBM¹, Nuance Communications²

25 Mar 2012

TL;DR: This paper introduces a segmentation process consisting of two phases, first, forced alignment is performed using an HMM-GMM model and the resulting segmentation is then locally refined using an SVM based boundary model.

...read moreread less

Abstract: Phonetic segmentation is an important step in the development of a concatenative TTS voice. This paper introduces a segmentation process consisting of two phases. First, forced alignment is performed using an HMM-GMM model. The resulting segmentation is then locally refined using an SVM based boundary model. Both the models are derived from multi-speaker data using a speaker adaptive training procedure. Evaluation results are obtained on the TIMIT corpus and on a proprietary single-speaker TTS corpus.

...read moreread less

10 citations

Proceedings Article•DOI•

Modular neural networks exploit large acoustic context through broad-class posteriors for continuous speech recognition

[...]

C. Antoniou¹•Institutions (1)

University of Essex¹

07 May 2001

TL;DR: This work proposes a decomposition of the network into modular components, where each component estimates a phone posterior, and uses the use of the broad-class posteriors along with the phone posteriors to greatly enhance acoustic modelling.

...read moreread less

Abstract: Traditionally, neural networks such as multi-layer perceptrons handle acoustic context by increasing the dimensionality of the observation vector, in order to include information of the neighbouring acoustic vectors, on either side of the current frame. As a result the monolithic network is trained on a high multi-dimensional space. The trend is to use the same fixed-size observation vector across the one network that estimates the posterior probabilities for all phones, simultaneously. We propose a decomposition of the network into modular components, where each component estimates a phone posterior. The size of the observation vector we use, is not fixed across the modularised networks, but rather accounts for the phone that each network is trained to classify. For each observation vector, we estimate very large acoustic context through broad-class posteriors. The use of the broad-class posteriors along with the phone posteriors greatly enhance acoustic modelling. We report significant improvements in phone classification and word recognition on the TIMIT corpus. Our results are also better than the best context-dependent system in the literature.

...read moreread less

9 citations

Proceedings Article•

Improved phone recognition on TIMIT using formant frequency data and confidence measures.

[...]

N. J. Wilkinson, Martin J. Russell

01 Jan 2002

TL;DR: A novel approach to integration of formant frequency and conventional MFCC data in phone recognition experiments on TIMIT by exploiting the relationship between formant frequencies and vocal tract geometry and reducing the error rate by 6.1% relative to a conventional representation alone.

...read moreread less

Abstract: This paper presents a novel approach to integration of formant frequency and conventional MFCC data in phone recognition experiments on TIMIT. Naive use of format data introduces classification errors if formant frequency estimates are poor, resulting in a net drop in performance. However, by exploiting a measure of confidence in the formant frequency estimates, formant data can contribute to classification in parts of a speech signal where it is reliable, and be replaced by conventional MFCC data when it is not. In this way an improvement of 4.7% is achieved. Moreover, by exploiting the relationship between formant frequencies and vocal tract geometry, simple formant-based vocal tract length normalisation reduces the error rate by 6.1% relative to a conventional representation alone.

...read moreread less

9 citations

Proceedings Article•DOI•

Speaker identification evaluation based on the speech biometric and i-vector model using the TIMIT and NTIMIT databases

[...]

Musab T. S. Al-Kaltakchi¹, Wai Lok Woo¹, Satnam Dlay¹, Jonathon A. Chambers¹•Institutions (1)

Newcastle University¹

29 May 2017

TL;DR: A speech biometric I-vector with low and fixed dimension of 100 to identify speakers and shows identification rate improvement compared with the classical Gaussian Mixture Model-Universal Background Model with a Maximum Likelihood (ML) classifier system.

...read moreread less

Abstract: Physiological and behavioural human characteristics are exploited in biometrics and performance metrics are used to measure some characteristic of an individual. The measure might lead to a one-to-one match, which is called authentication or one-from-N, and a match represents identification. In this paper, we exploit a speech biometric I-vector with low and fixed dimension of 100 to identify speakers. The main structure of the system consists of an I-vector with three fusion methods. It has low complexity and is efficient due to using an Extreme Learning Machine (ELM) classifier. The system is evaluated with 120 speakers from dialect regions one and four from both the TIMIT and NTIMIT databases in order to provide a fair comparison with our previous study based on the traditional Gaussian Mixture Model-Universal Background Model (GMM-UBM) with a Maximum Likelihood (ML) classifier system. The system shows identification rate improvement compared with the classical GMM-UBM.

...read moreread less

9 citations

Proceedings Article•DOI•

Analysis of speech segments using variable spectral/temporal resolution

[...]

X. Wang¹, Stephen A. Zahorian¹, S. Auberg•Institutions (1)

Old Dominion University¹

03 Oct 1996

TL;DR: The goal is to mimic the resolution properties of the human auditory system, but using a computationally efficient FFT-based front end rather than a more complex auditory model.

...read moreread less

Abstract: The authors present an approach for efficiently computing a compact temporal/spectral feature set for representing a segment of speech, with effective resolution depending on both frequency and time position within the segment. The goal is to mimic the resolution properties of the human auditory system, but using a computationally efficient FFT-based front end rather than a more complex auditory model. In particular they apply both frequency and time "warping" to FFT spectra to obtain good frequency resolution at low frequencies and good time resolution at high frequencies. Time resolution is also varied so that the center of the segment is better represented than the endpoints. The resolution can be varied by the selection of "warping" functions controlled using a small number of parameters. The method was experimentally verified for the classification of six stops extracted from the TIMIT continuous speech database. The best classification rate obtained was 81.2% for test data using 50 features computed with the method presented.

...read moreread less

9 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics