Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

A Bidirectional Target Filtering Model of Speech Coarticulation: two-stage Implementation for Phonetic Recognition

[...]

Li Deng, Dong Yu, Alex Acero

01 Jan 2006

TL;DR: In this article, a structured generative model of speech coarticulation and reduction is described with a novel two-stage implementation, where the dynamics of formants or vocal tract resonances (VTRs) in fluent speech are generated using prior information of resonance targets in the phone sequence, in absence of acoustic data.

...read moreread less

Abstract: A structured generative model of speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonances (VTRs) in fluent speech is generated using prior information of resonance targets in the phone sequence, in absence of acoustic data. Bidirectional temporal filtering with finite-impulse response (FIR) is applied to the segmental target sequence as the FIR filter’s input, where forward filtering produces anticipatory coarticulation and backward filtering produces regressive coarticulation. The filtering process is shown also to result in realistic resonance-frequency undershooting or reduction for fast-rate and low-effort speech in a contextually assimilated manner. At the second stage, the dynamics of speech cepstra are predicted analytically based on the FIR-filtered and speaker-adapted VTR targets, and the prediction residuals are modeled by Gaussian random variables with trainable parameters. The combined system of these two stages, thus, generates correlated and causally related VTR and cepstral dynamics, where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. We present details of model simulation demonstrating quantitative effects of speaking rate and segment duration on the magnitude of reduction, agreeing closely with experimental measurement results in the acoustic-phonetic literature. This two-stage model is implemented and applied to the TIMIT phonetic recognition task. Using the -best ( = 2000) rescoring paradigm, the new model, which contains only context-independent parameters, is shown to significantly reduce the phone error rate of a standard hidden Markov model (HMM) system under the same experimental conditions.

...read moreread less

23 citations

Journal Article•DOI•

Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition

[...]

Jun Cai¹, Ghazi Bouselmi², Yves Laprie², Jean-Paul Haton²•Institutions (2)

Xiamen University¹, French Institute for Research in Computer Science and Automation²

01 Apr 2009-Computer Speech & Language

TL;DR: A fast likelihood computation approach called dynamic Gaussian selection (DGS) is proposed, which is a one-pass search technique which generates a dynamic shortlist of Gaussians for each state during the procedure of likelihood computation.

...read moreread less

23 citations

Proceedings Article•DOI•

Speech enhancement using an MMSE spectral amplitude estimator based on a modulation domain Kalman filter with a Gamma prior

[...]

Yu Wang¹, Mike Brookes•Institutions (1)

University of Cambridge¹

20 Mar 2016

TL;DR: The proposed algorithm is evaluated on the TIMIT core test set using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms.

...read moreread less

Abstract: In this paper, we propose a minimum mean square error spectral estimator for clean speech spectral amplitudes that uses a Kalman filter to model the temporal dynamics of the spectral amplitudes in the modulation domain. Using a two-parameter Gamma distribution to model the prior distribution of the speech spectral amplitudes, we derive closed form expressions for the posterior mean and variance of the spectral amplitudes as well as for the associated update step of the Kalman filter. The performance of the proposed algorithm is evaluated on the TIMIT core test set using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms.

...read moreread less

23 citations

Proceedings Article•DOI•

Probabilistic Permutation Invariant Training for Speech Separation.

[...]

Midia Yousefi¹, Soheil Khorram¹, John H. L. Hansen²•Institutions (2)

University of Texas at Dallas¹, Aalborg University²

15 Sep 2019

TL;DR: In this paper, the output-label permutation is considered as a discrete latent random variable with a uniform prior distribution and a log-likelihood function is defined based on the prior distributions and the separation errors of all permutations.

...read moreread less

Abstract: Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the main obstacle in training neural networks for speech separation. Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error. In this study, we show that a major drawback of this technique is the overconfident choice of the output-label assignment, especially in the initial steps of training when the network generates unreliable outputs. To solve this problem, we propose Probabilistic PIT (Prob-PIT) which considers the output-label permutation as a discrete latent random variable with a uniform prior distribution. Prob-PIT defines a log-likelihood function based on the prior distributions and the separation errors of all permutations; it trains the speech separation networks by maximizing the log-likelihood function. Prob-PIT can be easily implemented by replacing the minimum function of PIT with a soft-minimum function. We evaluate our approach for speech separation on both TIMIT and CHiME datasets. The results show that the proposed method significantly outperforms PIT in terms of Signal to Distortion Ratio and Signal to Interference Ratio.

...read moreread less

23 citations

Proceedings Article•DOI•

On the inversion of Mel-frequency cepstral coefficients for speech enhancement applications

[...]

Laura E. Boucheron¹, P.L. De Leon¹•Institutions (1)

New Mexico State University¹

17 Nov 2008

TL;DR: This paper presents a means to invert MFCCs for use in speech enhancement applications and results for cepstral inversion is evaluated on the TIMIT speech corpus using perceptual evaluation of speech quality (PESQ).

...read moreread less

Abstract: The use of Mel-frequency cepstral coefficients (MFCCs) is well established in the fields of speech processing, particularly for speaker modeling within a Gaussian mixture model (GMM) speaker recognition system. The use of GMMs for speech enhancement applications has only recently been proposed in the literature; the concept of direct inversion of the MFCCs, however, has not been studied. In this paper we present a means to invert MFCCs for use in speech enhancement applications. Results for cepstral inversion is evaluated on the TIMIT speech corpus using perceptual evaluation of speech quality (PESQ).

...read moreread less

22 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics