scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel method is proposed for source smartphone identification by using encoding characteristics as the intrinsic fingerprint of recording devices by using Variance Threshold and SVM-RFE to choose the optimal features.

13 citations

Proceedings ArticleDOI
19 Apr 2015
TL;DR: The deep neural network component of current hybrid speech recognizers is trained on a context of consecutive feature vectors, and whether the time span of this input can be extended by splitting it up and modeling it in smaller chunks is investigated.
Abstract: The deep neural network component of current hybrid speech recognizers is trained on a context of consecutive feature vectors. Here, we investigate whether the time span of this input can be extended by splitting it up and modeling it in smaller chunks. One method for this is to train a hierarchy of two networks, while the less well-known split temporal context (STC) method models the left and right contexts of a frame separately. Here, we evaluate these techniques within a convolutional neural network framework, and find that the two approaches can be nicely combined. With the combined model we can expand the time-span of our network to 69 frames, and we achieve a 7.5% relative error rate reduction compared to modeling this large context as one block. We report a phone error rate of 17.1% on the TIMIT core test set, which is one of the best scores published.

13 citations

Proceedings ArticleDOI
01 Sep 2017
TL;DR: This work trains a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data and exceeds the clustering performance of all previous approaches on the well-known TIMIT dataset.
Abstract: Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices — namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.

12 citations

Proceedings Article
19 Jun 2016
TL;DR: An expectation-maximization (EM) based online CTC algorithm is introduced that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling and can also be trained to process an infinitely long input sequence without pre-segmentation or external reset.
Abstract: Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of training sequences is usually not uniform, which makes parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs). In this work, we introduce an expectation-maximization (EM) based online CTC algorithm that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling. The RNNs can also be trained to process an infinitely long input sequence without pre-segmentation or external reset. Moreover, the proposed approach allows efficient parallel training on GPUs. Our approach achieves 20.7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set. In the end-to-end speech recognition task on the Wall Street Journal corpus, a network can be trained with only 64 times of unrolling with little performance loss.

12 citations

Proceedings ArticleDOI
13 Dec 2012
TL;DR: A new algorithm for speech segmentation based on image analysis of the spectrogram of the signal, which segments the sound in search for the speech signal and returns to the algorithm for phoneme segmentation.
Abstract: This paper presents a new algorithm for speech segmentation based on image analysis of the spectrogram of the signal. The algorithm works in two loops: the first segments the sound in search for the speech signal. The segmented speech returns to the algorithm for phoneme segmentation. For evaluation, the algorithm was applied to TIMIT speech signals with correct speech segmentation of every tested signal, including signals under real-world noise.

12 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895