Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Fast and robust formant detection from LP data

[...]

Thorsten Smit¹, Friedrich Türckheim¹, Robert Mores¹•Institutions (1)

Hamburg University of Applied Sciences¹

01 Sep 2012-Speech Communication

TL;DR: This paper introduces a method for real-time selective root finding from linear prediction (LP) coefficients using a combination of spectral peak picking and complex contour integration (CI) using the group delay function (GDF).

...read moreread less

11 citations

Proceedings Article•DOI•

Spoken Corpora Data, Automatic Speech Recognition, and Bias Against African American Language: The case of Habitual 'Be'

[...]

Joshua Martin¹•Institutions (1)

University of Florida¹

03 Mar 2021

TL;DR: The authors found that spoken corpora used in the training and evaluation of widely used ASR systems are, in fact, biased against AAL and likely contribute to poorer ASR performance for Black users.

...read moreread less

Abstract: Recent work has revealed that major automatic speech recognition (ASR) systems such as Apple, Amazon, Google, IBM, and Microsoft perform much more poorly for Black U.S. speakers than for white U.S. speakers. Researchers postulate that this may be a result of biased datasets which are largely racially homogeneous. However, while the study of ASR performance with regards to the intersection of racial identity and language use is slowly gaining traction within AI, machine learning, and algorithmic bias research, little to nothing has been done to examine the data drawn from the spoken corpora which are commonly used in the training and evaluation of ASRs in order to understand whether or not they are actually biased, this study seeks to begin addressing this gap in the research by investigating spoken corpora used for ASR training and evaluation for a grammatical linguistic feature of what the field of linguistics terms African American Language (AAL), a systematic, rule-governed, and legitimate linguistic variety spoken by many (but not all) African Americans in the U.S. This grammatical feature, habitual 'be', is an uninflected form of 'be' that encodes the characteristic of habituality, as in "I be in my office by 7:30am", paraphrasable as "I am usually in my office by 7:30" in Standardized American English. This study utilizes established corpus linguistics methods on the transcribed data of four major spoken corpora -- Switchboard, Fisher, TIMIT, and LibriSpeech -- to understand the frequency, distribution, and usage of habitual 'be' within each corpus as compared to a reference corpus of spoken AAL -- the Corpus of Regional African American Language (CORAAL). The results find that habitual 'be' appears far less frequently, is dispersed in far fewer transcribed texts, and is surrounded by a much less diverse set of word types and parts of speech in the four ASR corpora as compared with CORAAL. This work provides foundational evidence that spoken corpora used in the training and evaluation of widely used ASR systems are, in fact, biased against AAL and likely contribute to poorer ASR performance for Black users.

...read moreread less

11 citations

Text-Independent Speaker Identification Using Hybrid Vector Quantization / Gaussian Mixture Models Pattern Classifier

[...]

Loh Mun Yee, Abdul Manan Ahmad

01 Oct 2008

TL;DR: A new hybrid Vector Quantization / Gaussian Mixture Models (VQ/GMM) model is introduced to improve recognition rate of the speaker identification system in the paper and experimental result shows that hybrid VQ/gMM gain the best result among 5 types of classifier.

...read moreread less

Abstract: Speaker recognition is a process where a person is recognized on the basis of his/her voice signals. In this paper we provide a brief overview for evolution of pattern classification technique used in speaker recognition. The most common approach to speaker recognition is the use of global Gaussian Mixture Model (GMM). The dominant advantages of GMM approach is that speaker identification can be performed in a completely text independent environment. Besides, GMM are base on probabilistic framework, it provide high-accuracy recognition. However, GMM techniques does not work well in some situation due to it behavior of ignores knowledge of the underlying phonetic content of the speech. To overcome those shortages, the new classification Model is generated. We introduce a new hybrid Vector Quantization / Gaussian Mixture Models (VQ/GMM) model to improve recognition rate of the speaker identification system in the paper. Besides, we also concerns about a comparison performance of hybrid VQ/GMM, DTW, VQ, GMM and SVM techniques for speaker identification. Topics of how we construct hybrid VQ/GMM for speaker identification and experimental result for these 5 techniques are presented in this paper. Experiments in this study were performed using TIMIT speech database. Experimental result shows that hybrid VQ/GMM gain the best result among 5 types of classifier.

...read moreread less

11 citations

Proceedings Article•DOI•

A convex hull approach to sparse representations for exemplar-based speech recognition

[...]

Tara N. Sainath¹, David Nahamoo¹, Dimitri Kanevsky¹, Bhuvana Ramabhadran¹, Parikshit Shah² - Show less +1 more•Institutions (2)

IBM¹, Massachusetts Institute of Technology²

01 Dec 2011

TL;DR: This paper forms the exemplar-based classification paradigm as a sparse representation (SR) problem, and explores the use of convex hull constraints to enforce both regularization and sparsity, and utilizes the Extended Baum-Welch (EBW) optimization technique to solve the SR problem.

...read moreread less

Abstract: In this paper, we propose a novel exemplar based technique for classification problems where for every new test sample the classification model is re-estimated from a subset of relevant samples of the training data.We formulate the exemplar-based classification paradigm as a sparse representation (SR) problem, and explore the use of convex hull constraints to enforce both regularization and sparsity. Finally, we utilize the Extended Baum-Welch (EBW) optimization technique to solve the SR problem. We explore our proposed methodology on the TIMIT phonetic classification task, showing that our proposed method offers statistically significant improvements over common classification methods, and provides an accuracy of 82.9%, the best single-classifier number reported to date.

...read moreread less

10 citations

Journal Article•DOI•

Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases

[...]

Musab T. S. Al-Kaltakchi¹, Mohammed A. M. Abdullah, Wai Lok Woo², Satnam Dlay³•Institutions (3)

Al-Mustansiriya University¹, Northumbria University², Newcastle University³

25 Mar 2021-Circuits Systems and Signal Processing

TL;DR: The results show that the i-vector method outperforms the GMM-UBM approach and other state- of-the-art methods under specific conditions, and that fusion techniques can be used to improve robustness to noise and handset effects.

...read moreread less

Abstract: In this article, a novel combined i-vector and an Extreme Learning Machine (ELM) is proposed for speaker identification. The ELM is chosen because it is fast to train and has a universal approximator property. Four combinations of features based on Mel Frequency Cepstral Coefficient and Power Normalized Cepstral Coefficient are used. Besides, seven fusion methods are exploited. The system is evaluated with three different databases, namely: the SITW 2006, NIST 2008, and the TIMIT database. This work employs the 2016 SITW database for the first time for speaker identification using the integration between the ELM and i-vector approach. From each database, 120 speakers with 1200 speech utterances are used (overall 360 speakers with 3600 speech utterances). Furthermore, comprehensive evaluations are exploited with a wide range of realistic background noise types (Stationary noise AWGN and Non-Stationary Noise types) with the handset effect. The proposed system is compared with the Gaussian Mixture Model-Universal Background Model (GMM-UBM) and other states of the art approaches. The results show that the i-vector method outperforms the GMM-UBM approach and other state- of-the-art methods under specific conditions, and that fusion techniques can be used to improve robustness to noise and handset effects.

...read moreread less

10 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics