Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Ifra uning for speaker rec

[...]

L. Besczcier, J. F. Bonastre

01 Jan 1998

TL;DR: A frame selection procedure for textindependent speaker identification by averaging the frame likelihoods along the whole test utterance, some of these are rejected (pruning) and the final score is computed with a limited number of frames.

...read moreread less

Abstract: In this paper, we propose a frame selection procedure for textindependent speaker identification. Instead of averaging the frame likelihoods along the whole test utterance, some of these are rejected (pruning) and the final score is computed with a limited number of frames. This pruning stage requires a prior frame level likelihocd normalization in order to make comparison between frames meaningful. This normalization procedure alone leads to a significative performance enhancement. As far as pruning is concerned, the optrmal number of frames pruned is learned on a tuning data set for normal and telephone: speech. Validation of the pruning procedure on 567 speakers leads to a 27% identification rate improvement on TIMIT, and to 17% on NTIMIT.

...read moreread less

Proceedings Article•DOI•

Spoken Keyword Detection Based on Wasserstein Generative Adversarial Network

[...]

Wen Zhao¹, She Kun¹, Chen Hao¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Dec 2020

TL;DR: In this paper, a Wasserstein Generative Adversarial Network (WGAN) based spoken keyword detection method was proposed, where the generator in WGAN fits the observation data to generate new data, and the discriminator classifies the generated data and the labels.

...read moreread less

Abstract: With the rapid development of artificial neural networks, it's applied to all areas of computer technologies. This paper combines deep neural network and keyword detection technology to propose a Wasserstein Generative Adversarial Network-based spoken keyword detection which is widely different from the existing methods. With the ability of Wasserstein Generative Adversarial Network (WGAN) to generates data autonomously, new sequences are generated, through which it analyzes whether keywords presence and where the keywords appear. In this method, the generator in WGAN fits the observation data to generate new data, and the discriminator classifies the generated data and the labels. The generator and discriminator are trained by combating learning. The method we propose is simple, does not require complex acoustic models, and does not need to be transcribed into text. It is also applicable to such languages without words. The TIMIT corpus and self-recorded Chinese corpus has been used for conducting experiments. Our method is compared with Convolutional Neural Network (CNN) and Deep Convolutional Generative Adversarial Network (DCGAN) and shows significant improvement over other techniques.

...read moreread less

Proceedings Article•

A hybrid ANN/HMM syllable recognition module based on vowel spotting.

[...]

John Sirigos, Nikos Fakotakis, George Kokkinakis

01 Jan 1999

TL;DR: An advanced multi-level vowel spotting method is used to achieve minimum vowel loss and accurate detection of the vowel location and duration and showed significant performance improvement compared to similar systems.

...read moreread less

Abstract: This paper presents a hybrid ANN/HMM syllable recognition module based on vowel spotting. An advanced multi-level vowel spotting method is used to achieve minimum vowel loss and accurate detection of the vowel location and duration. Discrete Hidden Markov Models (DSHMM), Multi Layer Perceptrons (MLP) and Heuristics (HR) are used for this purpose. A hybrid ANN/HMM technique is then used to recognize the syllables between the detected vowels. We replace the usual DSHMM probability parameters with combined neural network outputs. For this purpose both context dependent (CD) and context independent (CI) neural networks are used. Global normalization is employed on the parameters as opposed to the local normalization used on parameters in standard HMMs. Also, all parameters are estimated simultaneously according to the discriminative conditional maximum likelihood (CML) criterion. The tests were performed on the TIMIT and NTIMIT databases and showed significant performance improvement compared to similar systems.

...read moreread less

Proceedings Article•DOI•

Spectrum-entropy based beam-former with speaker tracking for hands-free continuous speech recognition in noise

[...]

N. George, D. Evangelos

01 Jul 2002

TL;DR: A novel speech beamformer for moving speakers in noisy environments that identifies the speech signal DOA in the direction where the signal's spectrum entropy is minimized and shows significant improvement in the recognition rate of moving speakers especially in very low SNR.

...read moreread less

Abstract: In hands-free speech recognition of moving speakers, the time interval where the source position can be assumed stationary varies. It is very common for the speaker to move rapidly within the data window exploited. In such cases the conventional fixed-window direction of arrival (DOA) estimation may lead to poor tracking performance. In this paper we present a novel speech beamformer for moving speakers in noisy environments. The localization algorithm extracts a set of candidate DOA of the signal sources using array signal processing methods in the frequency domain. A minimum variance (MV) beamformer identifies the speech signal DOA in the direction where the signal's spectrum entropy is minimized. The same localization algorithm is used to detect the closest direction to the initial estimation using a smaller window. The proposed method is evaluated using a phoneme recognition system and noise recordings from an air-condition fan and the TIMIT speech corpus. Extended experiments, carried out in the range of 25-0 dB SNR, show significant improvement in the recognition rate of moving speakers especially in very low SNR.

...read moreread less

Proceedings Article•DOI•

Statistical speech reconstruction at the phoneme level

[...]

Michael Savic, Michael D. Moore, C. Scoville

07 May 2001

TL;DR: Statistical methods for reconstructing speech at the phoneme level are used to find missing phonemes that are removed from sentences in the TIMIT corpus and the most likely candidate is selected to reconstruct the sentence.

...read moreread less

Abstract: Statistical methods for reconstructing speech at the phoneme level are used to find missing phonemes that are removed from sentences in the TIMIT corpus. Probabilities for the occurrence of the missing phoneme(s) are generated and the most likely candidate(s) selected to reconstruct the sentence. The method includes symmetric and asymmetric 'confidence windowing' around the missing phoneme(s) for determination of the most likely candidates. The reconstruction rates for one or more phonemes missing in a sequence can exceed 85%.

...read moreread less

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics