scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper introduces a method for real-time selective root finding from linear prediction (LP) coefficients using a combination of spectral peak picking and complex contour integration (CI) using the group delay function (GDF).

11 citations

Proceedings ArticleDOI
03 Mar 2021
TL;DR: The authors found that spoken corpora used in the training and evaluation of widely used ASR systems are, in fact, biased against AAL and likely contribute to poorer ASR performance for Black users.
Abstract: Recent work has revealed that major automatic speech recognition (ASR) systems such as Apple, Amazon, Google, IBM, and Microsoft perform much more poorly for Black U.S. speakers than for white U.S. speakers. Researchers postulate that this may be a result of biased datasets which are largely racially homogeneous. However, while the study of ASR performance with regards to the intersection of racial identity and language use is slowly gaining traction within AI, machine learning, and algorithmic bias research, little to nothing has been done to examine the data drawn from the spoken corpora which are commonly used in the training and evaluation of ASRs in order to understand whether or not they are actually biased, this study seeks to begin addressing this gap in the research by investigating spoken corpora used for ASR training and evaluation for a grammatical linguistic feature of what the field of linguistics terms African American Language (AAL), a systematic, rule-governed, and legitimate linguistic variety spoken by many (but not all) African Americans in the U.S. This grammatical feature, habitual 'be', is an uninflected form of 'be' that encodes the characteristic of habituality, as in "I be in my office by 7:30am", paraphrasable as "I am usually in my office by 7:30" in Standardized American English. This study utilizes established corpus linguistics methods on the transcribed data of four major spoken corpora -- Switchboard, Fisher, TIMIT, and LibriSpeech -- to understand the frequency, distribution, and usage of habitual 'be' within each corpus as compared to a reference corpus of spoken AAL -- the Corpus of Regional African American Language (CORAAL). The results find that habitual 'be' appears far less frequently, is dispersed in far fewer transcribed texts, and is surrounded by a much less diverse set of word types and parts of speech in the four ASR corpora as compared with CORAAL. This work provides foundational evidence that spoken corpora used in the training and evaluation of widely used ASR systems are, in fact, biased against AAL and likely contribute to poorer ASR performance for Black users.

11 citations

01 Oct 2008
TL;DR: A new hybrid Vector Quantization / Gaussian Mixture Models (VQ/GMM) model is introduced to improve recognition rate of the speaker identification system in the paper and experimental result shows that hybrid VQ/gMM gain the best result among 5 types of classifier.
Abstract: Speaker recognition is a process where a person is recognized on the basis of his/her voice signals. In this paper we provide a brief overview for evolution of pattern classification technique used in speaker recognition. The most common approach to speaker recognition is the use of global Gaussian Mixture Model (GMM). The dominant advantages of GMM approach is that speaker identification can be performed in a completely text independent environment. Besides, GMM are base on probabilistic framework, it provide high-accuracy recognition. However, GMM techniques does not work well in some situation due to it behavior of ignores knowledge of the underlying phonetic content of the speech. To overcome those shortages, the new classification Model is generated. We introduce a new hybrid Vector Quantization / Gaussian Mixture Models (VQ/GMM) model to improve recognition rate of the speaker identification system in the paper. Besides, we also concerns about a comparison performance of hybrid VQ/GMM, DTW, VQ, GMM and SVM techniques for speaker identification. Topics of how we construct hybrid VQ/GMM for speaker identification and experimental result for these 5 techniques are presented in this paper. Experiments in this study were performed using TIMIT speech database. Experimental result shows that hybrid VQ/GMM gain the best result among 5 types of classifier.

11 citations

Proceedings ArticleDOI
01 Dec 2011
TL;DR: This paper forms the exemplar-based classification paradigm as a sparse representation (SR) problem, and explores the use of convex hull constraints to enforce both regularization and sparsity, and utilizes the Extended Baum-Welch (EBW) optimization technique to solve the SR problem.
Abstract: In this paper, we propose a novel exemplar based technique for classification problems where for every new test sample the classification model is re-estimated from a subset of relevant samples of the training data.We formulate the exemplar-based classification paradigm as a sparse representation (SR) problem, and explore the use of convex hull constraints to enforce both regularization and sparsity. Finally, we utilize the Extended Baum-Welch (EBW) optimization technique to solve the SR problem. We explore our proposed methodology on the TIMIT phonetic classification task, showing that our proposed method offers statistically significant improvements over common classification methods, and provides an accuracy of 82.9%, the best single-classifier number reported to date.

10 citations

Journal ArticleDOI
TL;DR: The results show that the i-vector method outperforms the GMM-UBM approach and other state- of-the-art methods under specific conditions, and that fusion techniques can be used to improve robustness to noise and handset effects.
Abstract: In this article, a novel combined i-vector and an Extreme Learning Machine (ELM) is proposed for speaker identification. The ELM is chosen because it is fast to train and has a universal approximator property. Four combinations of features based on Mel Frequency Cepstral Coefficient and Power Normalized Cepstral Coefficient are used. Besides, seven fusion methods are exploited. The system is evaluated with three different databases, namely: the SITW 2006, NIST 2008, and the TIMIT database. This work employs the 2016 SITW database for the first time for speaker identification using the integration between the ELM and i-vector approach. From each database, 120 speakers with 1200 speech utterances are used (overall 360 speakers with 3600 speech utterances). Furthermore, comprehensive evaluations are exploited with a wide range of realistic background noise types (Stationary noise AWGN and Non-Stationary Noise types) with the handset effect. The proposed system is compared with the Gaussian Mixture Model-Universal Background Model (GMM-UBM) and other states of the art approaches. The results show that the i-vector method outperforms the GMM-UBM approach and other state- of-the-art methods under specific conditions, and that fusion techniques can be used to improve robustness to noise and handset effects.

10 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895