scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
01 Jan 1999
TL;DR: This post-mortem parsing algorithm combines syntactic parsing rules, morphological recognition, and closed-class lexicon with a method that attempts to parse a sentence first with a limited prediction for unknown words, and later reparse the sentence with a more broad prediction if first attempts fail.
Abstract: We present a parsing system designed to parse sentences containing unknown words as accurately as possible Our post-mortem parsing algorithm combines syntactic parsing rules, morphological recognition, and closed-class lexicon with a method that attempts to parse a sentence first with a limited prediction for unknown words, and later reparse the sentence with a more broad prediction if first attempts fail This allows great flexibility while parsing, and can offer improved accuracy and efficiency for parsing sentences that contain unknown words Experiments involving hand-created and computer-generated morphological recognizers are performed We also develop a part-of-speech tagging system designed to accurately tag sentences, including sentences containing unknown words The system is based on a basic hidden Markov model, but uses second-order approximations for the probability distributions (instead of first-order) The second order approximations give increased tagging accuracy, without increasing asymptotic running time over traditional trigram taggers A dynamic smoothing technique is used to address sparse data by attaching more weight to events that occur more frequently Unknown words are predicted using statistical estimation from the training corpus based on word endings only Information from different length suffixes is included in a weighted voting scheme, smoothed in a fashion similar to that used for the second-order HMM This tagging model achieves state-of-the-art accuracies Finally, the use of syntactic parsing rules to increase tagging accuracy is considered By allowing a parser to veto possible tag sequences due to violation of syntactic rules, it is shown that tagging errors were reduced by 28% on the Timit corpus This enhancement is useful for corpora that have rules sets defined

1 citations

Journal Article
TL;DR: A novel representation of speech for the cases where the speech signal is corrupted by additive noises is introduced by reducing additive noise effects via filtering stage that is based on an adaptive filtering in the spectral domain followed by filtering with gammachirp filter.
Abstract: The goal of robust feature extraction is to improve the performance of speech recognition in adverse conditions. This paper introduces a novel representation of speech for the cases where the speech signal is corrupted by additive noises. In this method, the speech features are computed by reducing additive noise effects via filtering stage that is based on an adaptive filtering in the spectral domain followed by filtering with gammachirp filter. A task of isolated word recognition was used to demonstrate the efficiency of these robust features. To improve the robustness of speech we introduce, in this paper, a new set of PLP vector. The above-mentioned technique was tested with white noise and colored noise such as airport, exhibition and babble noises under various noisy conditions within TIMIT and AURORA database. Experimental results show significant improvement in comparison to the results obtained using traditional features extraction techniques.

1 citations

Posted Content
TL;DR: The results indicate as long as a good E2E model pre-trained on normal speech, a relatively small set of whispered speech may suffice to obtain a reasonably good end-to-end whispered speech recognizer.
Abstract: Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data. In this paper, we present several approaches for end-to-end (E2E) recognition of whispered speech considering the special characteristics of whispered speech and the scarcity of data. This includes a frequency-weighted SpecAugment policy and a frequency-divided CNN feature extractor for better capturing the high frequency structures of whispered speech, and a layer-wise transfer learning approach to pre-train a model with normal speech then fine-tuning it with whispered speech to bridge the gap between whispered and normal speech. We achieve an overall relative reduction of 19.8% in PER and 31.9% in CER on a relatively small whispered TIMIT corpus. The results indicate as long as we have a good E2E model pre-trained on normal speech, a relatively small set of whispered speech may suffice to obtain a reasonably good E2E whispered speech recognizer.

1 citations

Proceedings ArticleDOI
11 Jun 2007
TL;DR: Although the performance of pitch frequency alone is poor on telephone speech, it provides enhancement in identification performance when used in combination with mel-frequency cepstral coefficients (mfee).
Abstract: In this paper, the impact of pitch frequency on speaker identification using Gaussian mixture model has been investigated employing clean speech (TIMIT) and telephone speech (NTIMIT) databases. Pitch frequency, as directly related to human vocal tract, may also be used as a speaker discriminating feature in noisy environments, such as telephone lines. Although the performance of pitch frequency alone is poor on telephone speech, it provides %8.34 enhancement in identification performance when used in combination with mel-frequency cepstral coefficients (mfee).

1 citations

Proceedings ArticleDOI
01 Jul 2020
TL;DR: Kernel based matching is proposed by considering histogram intersection kernel (HIK) as a matching metric for QbE-STD by training a CNN-based classifier using size-normalized images instead of splitting them into subimages as in [6].
Abstract: Query-by-Example based spoken term detection (QbE-STD) to audio search involves matching an audio query with the reference utterances to find the relevant utterances. QbE-STD involves computing a matching matrix between a query and reference utterance using a suitable metric. In this work we propose to use kernel based matching by considering histogram intersection kernel (HIK) as a matching metric. A CNN-based approach to QbE-STD involves first converting a matching matrix to a corresponding size-normalized image and classifying the image as relevant or not [6]. In this work, we propose to train a CNN-based classifier using size-normalized images instead of splitting them into subimages as in [6]. Training approach proposed in this work is expected to be more effective since there is less chance of a CNN based classifier getting confused. The effectiveness of the proposed kernel based matching and novel training approach is studied using TIMIT dataset.

1 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895