Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A comparison of neural-based visual recognisers for speech activity detection

[...]

Puay Hoon Tan¹•Institutions (1)

University of Lincoln¹

08 Jun 2022-International Journal of Speech Technology

TL;DR: In this article , a comparative analysis of three different approaches: classification with still images (CNN model), classification based on previous images (CRNN model), and classification of sequences of images (Seq2Seq model).

...read moreread less

Abstract: Existing literature on speech activity detection (SAD) highlights different approaches within neural networks but does not provide a comprehensive comparison to these methods. This is important because such neural approaches often require hardware-intensive resources. In this article, we provide a comparative analysis of three different approaches: classification with still images (CNN model), classification based on previous images (CRNN model), and classification of sequences of images (Seq2Seq model). Our experimental results using the Vid-TIMIT dataset show that the CNN model can achieve an accuracy of 97% whereas the CRNN and Seq2Seq models increase the classification to 99%. Further experiments show that the CRNN model is almost as accurate as the Seq2Seq model (99.1% vs. 99.6% of classification accuracy, respectively) but 57% faster to train (326 vs. 761 secs. per epoch).

...read moreread less

Proceedings Article•DOI•

DNN based Speaker Meta Information Estimation using Privacy-Preserving Speech Data

[...]

01 Jan 2022

TL;DR: This article explored three different types of methods for DNN-based speaker meta information estimation and compared the estimation results between the original speech and the anonymized speech using the TIMIT dataset.

...read moreread less

Abstract: There has been concerns about how speech data is collected and shared in the real world since human speech itself has personally identifiable information about the speaker and speech is available to reliably estimate speaker meta information. In this paper, we explore three different types of methods for DNN based speaker meta information estimation and compare the estimation results between the original speech and the anonymized speech. We used McAdam's coefficient-based signal processing technique to make the anonymized speech and privacy-preserving data. Experiments derived using TIMIT dataset show a slight degradation in performance of anonymized speech against the original. Experiments reveal that the model employing both DNN based embedding and voice anonymization can achieve comparable performance to the model using the original speech.

...read moreread less

Posted Content•DOI•

Complex Recurrent Variational Autoencoder with Application to Speech Enhancement

[...]

05 Apr 2022

TL;DR: In this article , the authors proposed a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used to account for the temporal property of speech signals.

...read moreread less

Abstract: As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used. Firstly, to account for the temporal property of speech signals, this work introduces complex-valued recurrent neural network in the complex VAE framework. Besides, L1 loss is used as the reconstruction loss in this framework. To exemplify the use of the complex generative model in speech processing, we choose speech enhancement as the specific application in this paper. Experiments are based on the TIMIT dataset. The results show that the proposed method offers improvements on objective metrics in speech intelligibility and signal quality.

...read moreread less

Book Chapter•DOI•

Text-Independent Talker Verification Using Cohort Normalized Scores

[...]

David J. Burr

01 Jan 1994

TL;DR: A telephone speech database suitable for talker identification research (Godfrey, 1992) was not generally available at the time of this research, though clean speech databases such as TIMIT (Garofolo et al., 1988) have been available.

...read moreread less

Abstract: It is difficult to implement talker recognition on the telephone network because of normal variation in the channel characteristics. The primary component of variation is due to the different telephone handsets or microphone frequency characteristics (Rosenberg and Soong, 1992). Lack of availability of telephone speech databases has also contributed to slow progress in the solution of these problems, though clean speech databases such as TIMIT (Garofolo et al., 1988) have been available. A telephone speech database suitable for talker identification research (Godfrey, 1992) was not generally available at the time of this research.

...read moreread less

Proceedings Article•DOI•

Sub-band Autoencoder features for Automatic Speech Recognition

[...]

Meet H. Soni¹, Manisha Sharma¹, Hardik B. Sailor¹, Hemant A. Patil¹•Institutions (1)

Indian Institute of Chemical Technology¹

01 Dec 2017

TL;DR: DNN trained on system combination of Mel-filterbank energies and SBAE features provide complementary information present in speech signal to help representation learning.

...read moreread less

Abstract: Recently, unsupervised representation learning to learn the features from speech signals has seen a tremendous upsurge for speech processing applications. In this paper, we investigate a modified architecture of autoencoder namely, subband autoencoder (SBAE) for representation learning. Features were learned from spectrogram as an input to SBAE for speech recognition task. SBAE features and Mel-filterbank energies as spectral features were trained separately on DNN and then system combination is used. This technique was applied to speech recognition task on TIMIT and WSJO databases. On TIMIT database, we achieved an absolute improvement in PER on test set of 3% over Mel-filterbank energies alone. For WSJO database, we achieved relative improvement of 9.89% in WER on test sets compared to filterbank energies. Hence, DNN trained on system combination of Mel-filterbank energies and SBAE features provide complementary information present in speech signal.

...read moreread less

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics