Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Parsing and tagging sentences containing lexically ambiguous and unknown tokens

[...]

Scott M. Thede, Mary P. Harper

01 Jan 1999

TL;DR: This post-mortem parsing algorithm combines syntactic parsing rules, morphological recognition, and closed-class lexicon with a method that attempts to parse a sentence first with a limited prediction for unknown words, and later reparse the sentence with a more broad prediction if first attempts fail.

...read moreread less

Abstract: We present a parsing system designed to parse sentences containing unknown words as accurately as possible Our post-mortem parsing algorithm combines syntactic parsing rules, morphological recognition, and closed-class lexicon with a method that attempts to parse a sentence first with a limited prediction for unknown words, and later reparse the sentence with a more broad prediction if first attempts fail This allows great flexibility while parsing, and can offer improved accuracy and efficiency for parsing sentences that contain unknown words Experiments involving hand-created and computer-generated morphological recognizers are performed We also develop a part-of-speech tagging system designed to accurately tag sentences, including sentences containing unknown words The system is based on a basic hidden Markov model, but uses second-order approximations for the probability distributions (instead of first-order) The second order approximations give increased tagging accuracy, without increasing asymptotic running time over traditional trigram taggers A dynamic smoothing technique is used to address sparse data by attaching more weight to events that occur more frequently Unknown words are predicted using statistical estimation from the training corpus based on word endings only Information from different length suffixes is included in a weighted voting scheme, smoothed in a fashion similar to that used for the second-order HMM This tagging model achieves state-of-the-art accuracies Finally, the use of syntactic parsing rules to increase tagging accuracy is considered By allowing a parser to veto possible tag sequences due to violation of syntactic rules, it is shown that tagging errors were reduced by 28% on the Timit corpus This enhancement is useful for corpora that have rules sets defined

...read moreread less

1 citations

Journal Article•

Autocorrelation-domain method for noise robust speech recognition

[...]

Hajer Rahali, Zied Hajaiej, Noureddine Ellouze

15 Jan 2016-International Journal of Tomography and Simulation

TL;DR: A novel representation of speech for the cases where the speech signal is corrupted by additive noises is introduced by reducing additive noise effects via filtering stage that is based on an adaptive filtering in the spectral domain followed by filtering with gammachirp filter.

...read moreread less

Abstract: The goal of robust feature extraction is to improve the performance of speech recognition in adverse conditions. This paper introduces a novel representation of speech for the cases where the speech signal is corrupted by additive noises. In this method, the speech features are computed by reducing additive noise effects via filtering stage that is based on an adaptive filtering in the spectral domain followed by filtering with gammachirp filter. A task of isolated word recognition was used to demonstrate the efficiency of these robust features. To improve the robustness of speech we introduce, in this paper, a new set of PLP vector. The above-mentioned technique was tested with white noise and colored noise such as airport, exhibition and babble noises under various noisy conditions within TIMIT and AURORA database. Experimental results show significant improvement in comparison to the results obtained using traditional features extraction techniques.

...read moreread less

1 citations

Posted Content•

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning

[...]

Heng-Jui Chang, Alexander H. Liu, Hung-yi Lee, Lin-Shan Lee

05 May 2020-arXiv: Computation and Language

TL;DR: The results indicate as long as a good E2E model pre-trained on normal speech, a relatively small set of whispered speech may suffice to obtain a reasonably good end-to-end whispered speech recognizer.

...read moreread less

Abstract: Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data. In this paper, we present several approaches for end-to-end (E2E) recognition of whispered speech considering the special characteristics of whispered speech and the scarcity of data. This includes a frequency-weighted SpecAugment policy and a frequency-divided CNN feature extractor for better capturing the high frequency structures of whispered speech, and a layer-wise transfer learning approach to pre-train a model with normal speech then fine-tuning it with whispered speech to bridge the gap between whispered and normal speech. We achieve an overall relative reduction of 19.8% in PER and 31.9% in CER on a relatively small whispered TIMIT corpus. The results indicate as long as we have a good E2E model pre-trained on normal speech, a relatively small set of whispered speech may suffice to obtain a reasonably good E2E whispered speech recognizer.

...read moreread less

1 citations

Proceedings Article•DOI•

Impact of Pitch Frequency on Speaker Identification

[...]

Ömer Eskidere¹, Figen Ertaş¹•Institutions (1)

Uludağ University¹

11 Jun 2007

TL;DR: Although the performance of pitch frequency alone is poor on telephone speech, it provides enhancement in identification performance when used in combination with mel-frequency cepstral coefficients (mfee).

...read moreread less

Abstract: In this paper, the impact of pitch frequency on speaker identification using Gaussian mixture model has been investigated employing clean speech (TIMIT) and telephone speech (NTIMIT) databases. Pitch frequency, as directly related to human vocal tract, may also be used as a speaker discriminating feature in noisy environments, such as telephone lines. Although the performance of pitch frequency alone is poor on telephone speech, it provides %8.34 enhancement in identification performance when used in combination with mel-frequency cepstral coefficients (mfee).

...read moreread less

1 citations

Proceedings Article•DOI•

Kernel based Matching and a Novel training approach for CNN-based QbE-STD

[...]

Prajyot Naik, Manisha Naik Gaonkar, Veena Thenkanidiyoor, A. D. Dileep¹•Institutions (1)

Indian Institute of Technology Mandi¹

01 Jul 2020

TL;DR: Kernel based matching is proposed by considering histogram intersection kernel (HIK) as a matching metric for QbE-STD by training a CNN-based classifier using size-normalized images instead of splitting them into subimages as in [6].

...read moreread less

Abstract: Query-by-Example based spoken term detection (QbE-STD) to audio search involves matching an audio query with the reference utterances to find the relevant utterances. QbE-STD involves computing a matching matrix between a query and reference utterance using a suitable metric. In this work we propose to use kernel based matching by considering histogram intersection kernel (HIK) as a matching metric. A CNN-based approach to QbE-STD involves first converting a matching matrix to a corresponding size-normalized image and classifying the image as relevant or not [6]. In this work, we propose to train a CNN-based classifier using size-normalized images instead of splitting them into subimages as in [6]. Training approach proposed in this work is expected to be more effective since there is less chance of a CNN based classifier getting confused. The effectiveness of the proposed kernel based matching and novel training approach is studied using TIMIT dataset.

...read moreread less

1 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics