TIMIT Acoustic-Phonetic Continuous Speech Corpus

Open AccessDataset

TIMIT Acoustic-Phonetic Continuous Speech Corpus

Chats0

TLDR

The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.

Abstract:

The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

LSTM: A Search Space Odyssey

Klaus Greff, +4 more

- 01 Oct 2017 -

IEEE Transactions on Neural Networks

TL;DR: This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

...read moreread less

Book

Supervised Sequence Labelling with Recurrent Neural Networks

Alex Graves

TL;DR: A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

...read moreread less

Posted Content

Sequence Transduction with Recurrent Neural Networks

Alex Graves

- 14 Nov 2012 -

arXiv: Neural and Evolutionary Computing

TL;DR: This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.

...read moreread less

Posted Content

Attention-Based Models for Speech Recognition

Jan Chorowski, +4 more

- 24 Jun 2015 -

arXiv: Computation and Language

TL;DR: The attention-mechanism is extended with features needed for speech recognition and a novel and generic method of adding location-awareness to the attention mechanism is proposed to alleviate the issue of high phoneme error rate.

...read moreread less

Journal ArticleDOI

On training targets for supervised speech separation

Yuxuan Wang, +2 more

- 01 Dec 2014 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.

...read moreread less