scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
10 Mar 2001
TL;DR: It was found that error-free data recovery resulted in voiced and unvoiced frames, while high bit-errors occurred in frames containing voiced/unvoiced boundaries, and modifying the phase, in accordance with data, led to higher successful retrieval than modifying the spectral density of the cover audio.
Abstract: This paper presents results of two methods of embedding digital audio data into another audio signal for secure communication The data-embedded, or stego, signal is created for transmission by modifying the power spectral density or the phase spectrum of the cover audio at the perceptually masked frequencies in each frame in accordance with the covert audio data Embedded data in each frame is recovered from the quantized frames of the received stego signal without synchronization or reference to the original cover signal Using utterances from Texas Instruments Massachusetts Institute of Technology (TIMIT) databases, it was found that error-free data recovery resulted in voiced and unvoiced frames, while high bit-errors occurred in frames containing voiced/unvoiced boundaries Modifying the phase, in accordance with data, led to higher successful retrieval than modifying the spectral density of the cover audio In both cases, no difference was detected in perceived speech quality between the cover signal and the received stego signal

8 citations

Journal ArticleDOI
TL;DR: Preliminary results suggest that non-native speakers of English fail to produce flaps and reduced vowels, insert or delete segments, engage in more self-correction, and place pauses in different locations from native speakers.
Abstract: This study investigates differences in sentence and story production between native and non-native speakers of English for use with a system of Automatic Speech Recognition (ASR). Previous studies have shown that production errors by non-native speakers of English include misproduced segments (Flege, 1995), longer pause duration (Anderson-Hsieh and Venkatagiri, 1994), abnormal pause location within clauses (Kang, 2010), and non-reduction of function words (Jang, 2009). The present study uses phonemically balanced sentences from TIMIT (Garofolo et al., 1993) and a story to provide an additional comparison of the differences in production by native and non-native speakers of English. Consistent with previous research, preliminary results suggest that non-native speakers of English fail to produce flaps and reduced vowels, insert or delete segments, engage in more self-correction, and place pauses in different locations from native speakers. Non-native English speakers furthermore produce different patterns of intonation from native speakers and produce errors indicative of transfer from their L1 phonology, such as coda deletion and vowel epenthesis. Native speaker productions also contained errors, the majority of which were content-related. These results indicate that difficulties posed by English ASR systems in recognizing non-native speech are due largely to the heterogeneity of non-native production.

8 citations

Posted Content
TL;DR: This work proposes a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network that tries to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error.
Abstract: Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.

8 citations

Journal ArticleDOI
TL;DR: A new feature set, namely, the Histogram of the DCT-Cepstrum Coefficients, inspired by the common use of the MFCC, but simpler and faster in computation is introduced.
Abstract: There are several known feature sets for text-independent speaker-identification systems, most of which depend on spectral information. Among these feature sets as a most successful one, there is the set of the Mel-Frequency Cepstrum Coefficients MFCC. This paper introduces a new feature set, namely, the Histogram of the DCT-Cepstrum Coefficients, inspired by the common use of the MFCC, but simpler and faster in computation. A text-independent speaker-identification system based on the DCT-Cepstrum Histogram and Gaussian Mixture Model GMM is implemented. The new feature was tested using speech files from the ELSDSR database and TIMIT corpus. The new feature set managed to achieve high efficiency rates with speaker identification accuracy of 100% on 23 speakers from the ELSDSR database, and 99% on 630 speakers from the TIMIT corpus.

8 citations

Book ChapterDOI
19 Nov 2014
TL;DR: Experiments show that DMNs improve substantially the recognition accuracy over DNNs and other traditional techniques in both clean and noisy conditions on the TIMIT dataset.
Abstract: Deep Neural Networks DNN have become very popular for acoustic modeling due to the improvements found over traditional Gaussian Mixture Models GMM. However, not many works have addressed the robustness of these systems under noisy conditions. Recently, the machine learning community has proposed new methods to improve the accuracy of DNNs by using techniques such as dropout and maxout. In this paper, we investigate Deep Maxout Networks DMN for acoustic modeling in a noisy automatic speech recognition environment. Experiments show that DMNs improve substantially the recognition accuracy over DNNs and other traditional techniques in both clean and noisy conditions on the TIMIT dataset.

8 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895