Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
20 Nov 1995TL;DR: A relational database management system has been developed to house the speech data and provides much more usability, flexibility and expandibility than file based speech corpora such as TIMIT.
Abstract: A collection of digits and words, spoken with a New Zealand English accent, has been systematically and formally collected. This collection along with the beginning and end points of the realised phonemes from within the words, comprise the Otago Speech Corpora. A relational database management system has been developed to house the speech data. This system provides much more usability, flexibility and expandibility than file based speech corpora such as TIMIT.
31 citations
••
TL;DR: A novel discriminative objective function for the estimation of hidden Markov model (HMM) parameters, based on the calculation of overall risk, which minimises the risk of misclassification on the training database and thus maximises recognition accuracy.
31 citations
••
TL;DR: An energy-constrained signal subspace (ECSS) method is proposed for speech enhancement and automatic speech recognition under additive noise condition and it was found that the ECSS method can achieve very high word recognition accuracy (WRA) for the digits set under low SNR conditions.
30 citations
••
05 Mar 2017TL;DR: A recently developed deep learning model, recurrent convolutional neural network (RCNN), is proposed to use for speech processing, which inherits some merits of recurrent neural networks (RNN) and convolutionals (CNN) and is competitive with previous methods in terms of accuracy and efficiency.
Abstract: Different neural networks have exhibited excellent performance on various speech processing tasks, and they usually have specific advantages and disadvantages. We propose to use a recently developed deep learning model, recurrent convolutional neural network (RCNN), for speech processing, which inherits some merits of recurrent neural network (RNN) and convolutional neural network (CNN). The core module can be viewed as a convolutional layer embedded with an RNN, which enables the model to capture both temporal and frequency dependance in the spectrogram of the speech in an efficient way. The model is tested on speech corpus TIMIT for phoneme recognition and IEMOCAP for emotion recognition. Experimental results show that the model is competitive with previous methods in terms of accuracy and efficiency.
30 citations
•
TL;DR: The proposed model is a convolutional neural network that operates directly on the raw waveform that is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle and reaches state-of-the-art performance on both data sets.
Abstract: We propose a self-supervised representation learning model for the task of unsupervised phoneme boundary detection. The model is a convolutional neural network that operates directly on the raw waveform. It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle. At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries. As such, the proposed model is trained in a fully unsupervised manner with no manual annotations in the form of target boundaries nor phonetic transcriptions. We compare the proposed approach to several unsupervised baselines using both TIMIT and Buckeye corpora. Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets. Furthermore, we experimented with expanding the training set with additional examples from the Librispeech corpus. We evaluated the resulting model on distributions and languages that were not seen during the training phase (English, Hebrew and German) and showed that utilizing additional untranscribed data is beneficial for model performance.
30 citations