Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A new wavelet thresholding method for speech enhancement based on symmetric Kullback-Leibler divergence

[...]

Shima Tabibian¹, Ahmad Akbari¹, Babak Nasersharif²•Institutions (2)

Iran University of Science and Technology¹, University of Gilan²

08 Dec 2009

TL;DR: A new method is proposed to determine the threshold value based on the symmetric Kullback-Leibler divergence between the probability distributions of noisy speech and noise wavelet coefficients using segmental SNR.

...read moreread less

Abstract: Performance of wavelet thresholding methods for speech enhancement is dependent on estimating an exact threshold value in the wavelet sub-bands. In this paper, we propose a new method for more exact estimating the threshold value. We proposed to determine the threshold value based on the symmetric Kullback-Leibler divergence between the probability distributions of noisy speech and noise wavelet coefficients. In the next step, we improved this value using segmental SNR. We used some of TIMIT utterances to assess the performance of the proposed threshold. The algorithm is evaluated using the PESQ score and the SNR improvement. In average, we obtain 2db SNR improvement and a PESQ score increase up to 0.7 in comparison to the conventional wavelet thresholding approaches.

...read moreread less

20 citations

Proceedings Article•DOI•

A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech

[...]

Shareef Babu Kalluri¹, Deepu Vijayasenan¹, Sriram Ganapathy²•Institutions (2)

National Institute of Technology, Karnataka¹, Indian Institute of Science²

08 May 2019

TL;DR: A unified DNN architecture to predict both height and age of a speaker for short durations of speech is proposed, and a novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset.

...read moreread less

Abstract: Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.

...read moreread less

20 citations

Journal Article•DOI•

Convolutional support vector machines for speech recognition

[...]

Vishal Passricha¹, Rajesh Kumar Aggarwal¹•Institutions (1)

National Institute of Technology, Kurukshetra¹

01 Sep 2019-International Journal of Speech Technology

TL;DR: A new deep architecture in which two heterogeneous classification techniques named as CNN and support vector machines (SVMs) are combined together is proposed, which improves the result by 13.33% and 2.31% over baseline CNN and segmental recurrent neural networks respectively.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have demonstrated the state-of-the-art performances on automatic speech recognition. Softmax activation function for prediction and minimizing the cross-entropy loss is employed by most of the CNNs. This paper proposes a new deep architecture in which two heterogeneous classification techniques named as CNN and support vector machines (SVMs) are combined together. In this proposed model, features are learned using convolution layer and classified by SVMs. The last layer of CNN i.e. softmax layer is replaced by SVMs to efficiently deal with high dimensional features. This model should be interpreted as a special form of structured SVM and named as convolutional support vector machine (CSVM). Instead of training each component separately, the parameters of CNN and SVMs are jointly trained using frame level max-margin, sequence level max-margin, and state-level minimum Bayes risk criterion. The performance of CSVM is checked on TIMIT and Wall Street Journal datasets for phone recognition. By incorporating the features of both CNN and SVMs, CSVM improves the result by 13.33% and 2.31% over baseline CNN and segmental recurrent neural networks respectively.

...read moreread less

20 citations

Journal Article•DOI•

A lattice search technique for a long-contextual-span hidden trajectory model of speech

[...]

Dong Yu¹, Li Deng¹, Alex Acero¹•Institutions (1)

Microsoft¹

01 Sep 2006-Speech Communication

TL;DR: Improved likelihood score computation in theHTM and a novel A∗-based time-asynchronous lattice-constrained decoding algorithm for the HTM evaluation are described and improvement of recognition accuracy by the new search algorithm on recognition lattices over the traditional N-best rescoring paradigm is shown.

...read moreread less

20 citations

Proceedings Article•DOI•

Blind phoneme segmentation with temporal prediction errors

[...]

Paul Michel¹, Okko Räsänen², Roland Thiollière, Emmanuel Dupoux•Institutions (2)

Carnegie Mellon University¹, Aalto University²

01 Jul 2017

TL;DR: In this article, an unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural networks is proposed for phonemic segmentation of speech, which consists in analyzing the error profile of a model trained to predict speech features frame-by-frame.

...read moreread less

Abstract: Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics