Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

[...]

Xie Sun¹, Yunxin Zhao²•Institutions (2)

Nuance Communications¹, University of Missouri²

01 Feb 2014-Eurasip Journal on Audio, Speech, and Music Processing

TL;DR: Experimental results on the TIMIT phone recognition task and a large vocabulary continuous speech recognition task of telehealth captioning demonstrated that the proposed approach of integrating template matching with statistical modeling significantly improved recognition accuracy over the hidden Markov modeling baselines for both TIMIT and telehealth tasks.

...read moreread less

Abstract: We propose a novel approach of integrating exemplar-based template matching with statistical modeling to improve continuous speech recognition. We choose the template unit to be context-dependent phone segments (triphone context) and use multiple Gaussian mixture model (GMM) indices to represent each frame of speech templates. We investigate two different local distances, log likelihood ratio (LLR) and Kullback-Leibler (KL) divergence, for dynamic time warping (DTW)-based template matching. In order to reduce computation and storage complexities, we also propose two methods for template selection: minimum distance template selection (MDTS) and maximum likelihood template selection (MLTS). We further propose to fine tune the MLTS template representatives by using a GMM merging algorithm so that the GMMs can better represent the frames of the selected template representatives. Experimental results on the TIMIT phone recognition task and a large vocabulary continuous speech recognition (LVCSR) task of telehealth captioning demonstrated that the proposed approach of integrating template matching with statistical modeling significantly improved recognition accuracy over the hidden Markov modeling (HMM) baselines for both TIMIT and telehealth tasks. The template selection methods also provided significant accuracy gains over the HMM baseline while largely reducing the computation and storage complexities. When all templates or MDTS were used, using the LLR local distance gave better performance than the KL local distance. For MLTS and template compression, KL local distance gave better performance than the LLR local distance, and template compression further improved the recognition accuracy on top of MLTS while having less computational cost.

...read moreread less

3 citations

Proceedings Article•DOI•

A multistage algorithm for fricative spotting

[...]

Dima Ruinskiy¹, Yizhar Lavner¹•Institutions (1)

Tel-Hai Academic College¹

11 Apr 2014

TL;DR: An algorithm for spotting fricative consonants in continuous speech that relies only on features extracted directly from the audio signal and on common classification techniques, making it simple to implement and language-invariant.

...read moreread less

Abstract: We present an algorithm for spotting fricative consonants in continuous speech. Fricative spotting can be useful in professional audio applications, where excessive accentuation of these phonemes can degrade the aesthetics of voice recordings, or in applications for the hearing-impaired, where certain manipulations can increase their perception. All stages of our algorithm rely only on features extracted directly from the audio signal and on common classification techniques, making it simple to implement and language-invariant. In the first stage, a linear classifier, pre-trained using the Fisher's Linear Discriminant Analysis (LDA) method, is used to detect fricatives inside speech sentences. In the second stage, the detected phonemes are further analyzed using a decision-tree classifier, attempting to reject false detections. Tested on the full corpus of the TIMIT audio database the algorithm achieved very good detection rates across the entire range of fricative phonemes.

...read moreread less

3 citations

Proceedings Article•

Phonetic classification of timit segments preprocessed with lyon's cochlear model using a supervised/unsupervised hybrid neural network.

[...]

Gary N. Tajchman, Nathan Intrator

01 Jan 1992

TL;DR: This work uses a very detailed biologically motivated input representation of the speech tokens-Lyon's cochlear model as implemented by Slaney 20 to produce results comparable to those obtained by others without the addition of time normaliza-tion.

...read moreread less

Abstract: We report results on vowel and stop consonant recognition with tokens extracted from the TIMIT database. Our current system diiers from others doing similar tasks in that we do not use any speciic time normalization techniques. We use a very detailed biologically motivated input representation of the speech tokens-Lyon's cochlear model as implemented by Slaney 20]. This detailed, high dimensional representation, known as a cochleagram, is classi-ed by either a back-propagation or by a hybrid super-vised/unsupervised neural network classiier. The hybrid network is composed of a biologically motivated unsuper-vised network and a supervised back-propagation network. This approach produces results comparable to those obtained by others without the addition of time normaliza-tion.

...read moreread less

3 citations

Proceedings Article•DOI•

Phonetic and phonological interference of English pronunciation by native Bengali (L1-Bengali,L2-English) speakers

[...]

Shambhu Nath Saha¹, Shyamal Kr. Das Mandal¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Sep 2014

TL;DR: It was observed that vowels are inserted by L1 Bengali speakers to break up consonant clusters or avoid syllable final coda consonant in case of phonological problem.

...read moreread less

Abstract: Due to the importance of English grows day by day, it is necessary to acquire English language properly for second language learner where proper acquisition involves in correct pronunciation. Forty native (L1) Bengali speakers' read speech data of “The North Wind and the Sun” was analyzed to find out the phonetic and phonological problems of L1 Bengali speakers' English speech. During the study automatic phoneme alignment was carried out by the HTK tool with a modified TIMIT dictionary. The result shows that L1 Bengali speakers substitute new English consonant and vowel phonemes by Bengali sounds which are close to those English sounds. In case of phonological problem, it was observed that vowels are inserted by L1 Bengali speakers to break up consonant clusters or avoid syllable final coda consonant. The effect of fluency on phonetic and phonological problems of L1 Bengali speakers was also presented in the paper.

...read moreread less

3 citations

Proceedings Article•DOI•

Systolic-Array Deep-Learning Acceleration Exploring Pattern-Indexed Coordinate-Assisted Sparsity for Real-Time On-Device Speech Processing

[...]

Shiwei Liu¹, Zihao Zhao¹, Yanhong Wang¹, Qiaosha Zou¹, Yiyun Zhang¹, C.-J. Richard Shi² - Show less +2 more•Institutions (2)

Fudan University¹, University of Washington²

22 Jun 2021

TL;DR: In this paper, a hardware-software co-design for efficient sparse deep neural networks (DNNs) implementation in a regular systolic array for real-time on-device speech processing is presented.

...read moreread less

Abstract: This paper presents a hardware-software co-design for efficient sparse deep neural networks (DNNs) implementation in a regular systolic array for real-time on-device speech processing. The weight pruning format, exploring pattern-based coordinate-assisted (PICA) sparsity, expands the pattern-based pruning into both convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It reduces the index storage overhead as well as avoids accuracy degradation. The proposed systolic accelerator leverages the intrinsic data reuse and locality to accommodate the PICA-based sparsity without using complex data distribution networks. It also supports DNNs with different topologies. By reducing the model size by 16x, PICA sparsification reduces 6.02x index storage overhead while still achieving 20.7% WER in TIMIT dataset. For the pruned WaveNet and LSTM, the accelerator achieves 0.62 and 2.69 TOPS/W energy efficiency, 1.7x to 10x higher than the state-of-the-art.

...read moreread less

3 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics