Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Source smartphone identification by exploiting encoding characteristics of recorded speech

[...]

Chao Jin¹, Rangding Wang¹, Diqun Yan¹•Institutions (1)

Ningbo University¹

01 Jun 2019-Digital Investigation

TL;DR: A novel method is proposed for source smartphone identification by using encoding characteristics as the intrinsic fingerprint of recording devices by using Variance Threshold and SVM-RFE to choose the optimal features.

...read moreread less

13 citations

Proceedings Article•DOI•

Modeling long temporal contexts in convolutional neural network-based phone recognition

[...]

László Tóth¹•Institutions (1)

Hungarian Academy of Sciences¹

19 Apr 2015

TL;DR: The deep neural network component of current hybrid speech recognizers is trained on a context of consecutive feature vectors, and whether the time span of this input can be extended by splitting it up and modeling it in smaller chunks is investigated.

...read moreread less

Abstract: The deep neural network component of current hybrid speech recognizers is trained on a context of consecutive feature vectors. Here, we investigate whether the time span of this input can be extended by splitting it up and modeling it in smaller chunks. One method for this is to train a hierarchy of two networks, while the less well-known split temporal context (STC) method models the left and right contexts of a frame separately. Here, we evaluate these techniques within a convolutional neural network framework, and find that the two approaches can be nicely combined. With the combined model we can expand the time-span of our network to 69 frames, and we achieve a 7.5% relative error rate reduction compared to modeling this large context as one block. We report a phone error rate of 17.1% on the TIMIT core test set, which is one of the best scores published.

...read moreread less

13 citations

Proceedings Article•DOI•

Learning embeddings for speaker clustering based on voice equality

[...]

Yanick Xavier Lukic¹, Carlo Vogt¹, Oliver Dürr¹, Thilo Stadelmann¹•Institutions (1)

Zürcher Fachhochschule¹

01 Sep 2017

TL;DR: This work trains a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data and exceeds the clustering performance of all previous approaches on the well-known TIMIT dataset.

...read moreread less

Abstract: Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices — namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.

...read moreread less

12 citations

Proceedings Article•

Sequence to sequence training of CTC-RNNs with partial windowing

[...]

Kyuyeon Hwang¹, Wonyong Sung¹•Institutions (1)

Seoul National University¹

19 Jun 2016

TL;DR: An expectation-maximization (EM) based online CTC algorithm is introduced that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling and can also be trained to process an infinitely long input sequence without pre-segmentation or external reset.

...read moreread less

Abstract: Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of training sequences is usually not uniform, which makes parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs). In this work, we introduce an expectation-maximization (EM) based online CTC algorithm that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling. The RNNs can also be trained to process an infinitely long input sequence without pre-segmentation or external reset. Moreover, the proposed approach allows efficient parallel training on GPUs. Our approach achieves 20.7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set. In the end-to-end speech recognition task on the Wall Street Journal corpus, a network can be trained with only 64 times of unrolling with little performance loss.

...read moreread less

12 citations

Proceedings Article•DOI•

Speech and phoneme segmentation under noisy environment through spectrogram image analysis

[...]

Diogo C. Costa¹, G.A.M. Lopes¹, Carlos A. B. Mello¹, H. O. Viana¹•Institutions (1)

Federal University of Pernambuco¹

13 Dec 2012

TL;DR: A new algorithm for speech segmentation based on image analysis of the spectrogram of the signal, which segments the sound in search for the speech signal and returns to the algorithm for phoneme segmentation.

...read moreread less

Abstract: This paper presents a new algorithm for speech segmentation based on image analysis of the spectrogram of the signal. The algorithm works in two loops: the first segments the sound in search for the speech signal. The segmented speech returns to the algorithm for phoneme segmentation. For evaluation, the algorithm was applied to TIMIT speech signals with correct speech segmentation of every tested signal, including signals under real-world noise.

...read moreread less

12 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics