scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
01 Dec 2008
TL;DR: It is shown in this paper that the automatic voiced - unvoiced segmentation obtained using the method described in the next sections and the manual voiced- unvoicing segmentation provided by TIMIT are very similar.
Abstract: This paper proposes a voiced - unvoiced measure based on the Analytic Signal computation. This voiced - unvoiced feature can be useful for many speech processing applications. For instance, considering speech recognition, it could be incorporated into commonly used acoustic feature vectors, such as for example the Mel Frequency Cepstral Coefficients (MFCC) and their first two derivatives, in order to improve the performance of the overall system. The evaluation of the developed measure has been performed on the TIMIT database. TIMIT has been manually segmented into phones. The voicing information can easily be derived from this segmentation. It is shown in this paper that the automatic voiced - unvoiced segmentation obtained using the method described in the next sections and the manual voiced - unvoiced segmentation provided by TIMIT are very similar.

2 citations

Book ChapterDOI
05 Oct 2014
TL;DR: The results show that multi-band processing clearly outperforms the baseline feature recombination method in every case tested and can be further enhanced by using the recently introduced technology of deep neural nets (DNNs).
Abstract: Spectro-temporal feature extraction and multi-band processing were both designed to make the speech recognizers more robust. Although they have been used for a long time now, very few attempts have been made to combine them. This is why here we integrate two spectro-temporal feature extraction methods into a multi-band framework. We assess the performance of our spectro-temporal feature sets both individually (as a baseline) and in combination with multi-band processing in phone recognition tasks on clean and noise contaminated versions of the TIMIT dataset. Our results show that multi-band processing clearly outperforms the baseline feature recombination method in every case tested. This improved performance can also be further enhanced by using the recently introduced technology of deep neural nets (DNNs).

2 citations

Journal ArticleDOI
TL;DR: In this article, a successful and efficient kurtosis maximization algorithm was proposed for speech separation of two sources from two linear mixtures for use in problems with arbitrary numbers of sources and mixtures.
Abstract: In many real‐world applications of blind source separation, the number of mixture signals, L, available for analysis often differs from the number of sources, M, which may be present. In this paper, we extend a successful and efficient kurtosis maximization algorithm used in speech separation of two sources from two linear mixtures for use in problems with arbitrary numbers of sources and mixtures. We examine three cases: underdetermined (M L). In each of these cases, we present simulation results (using the TIMIT speech corpus) and discuss separation matrix initialization issues and observed algorithm limitations. We find that in the critically determined case, the algorithm performs well (20–40 dB SIR) at separating four sources from four mixtures. For the other cases, our results are mixed. In the overdetermined case (two sources, three mixtures), the algorithm performs well (20–40 dB SIR) and we find that the extra mixtures do not result in better SIR measurements. In the underdetermined case (three sources, two mixtures), we are able to separate out at least one source (sometimes two) with the other output signals each containing pairs of the remaining sources.

2 citations

Proceedings ArticleDOI
13 May 2002
TL;DR: A novel speech beam-former for noisy environments that identifies the speech signal in the direction where the signal's spectrum entropy is minimized and the recognition rate increases significantly compared to the rate obtained by a single microphone.
Abstract: Detection of the speaker position is a crucial task in hands-free speech recognition applications. In this paper we present a novel speech beam-former for noisy environments. Initially, the localization algorithm extracts a set of candidate directions of the signal sources using array signal processing methods in the frequency domain. Then, a minimum variance (MV) beam-former identifies the speech signal in the direction where the signal's spectrum entropy is minimized. The proposed method is evaluated by a phoneme recognition system using noise recordings from an air-condition fan and the TIMIT speech corpus. Extended experiments, carried out in the range of 25–0 dB, show almost perfect estimation of the speaker DOA in all cases. As a consequence, the recognition rate increases significantly compared to the rate obtained by a single microphone. The recognition improvement increases especially in very low SNRs.

2 citations

Proceedings ArticleDOI
01 Jan 2004
TL;DR: It is shown that while classification accuracy using MeI frequency cepstral coefficients as features does not improve with sub-banding, the accuracy increases from 36.1% to 42.0% using sub- banded reconstructed phase spaces to model the phonemes.
Abstract: This paper examines the use of multi-band reconstructed phase spaces as models for phoneme classification. Sub-banding reconstructed phase spaces combines linear, frequency-based techniques with a nonlinear modeling approach to speech recognition. Experiments comparing the effects of filtering speech signals for both reconstructed phase space and traditional speech recognition approaches are presented. These experiments study the use of two non-overlapping subbands for isolated phoneme classification on the TIMIT corpus. It is shown that while classification accuracy using MeI frequency cepstral coefficients as features does not improve with sub-banding, the accuracy increases from 36.1% to 42.0% using sub-banded reconstructed phase spaces to model the phonemes.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895