Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Unsupervised spoken term detection with acoustic segment model

[...]

Haipeng Wang¹, Tan Lee¹, Cheung-Chi Leung²•Institutions (2)

The Chinese University of Hong Kong¹, Institute for Infocomm Research Singapore²

28 Nov 2011

TL;DR: Experimental results show that the use of ASM posteriorgrams leads to consistently better performance of detection than the conventional GMM anteriorgrams.

...read moreread less

Abstract: This paper describes a study on query-by-example spoken term detection (STD) using the acoustic segment modeling technique. Acoustic segment models (ASMs) are a set of hidden Markov models (HMM) that are obtained in an unsupervised manner without using any transcription information. The training of ASMs follows an iterative procedure, which consists of the steps of initial segmentation, segments labeling, and HMM parameter estimation. The ASMs are incorporated into a template-matching framework for query-by-example STD. Both the spoken query examples and the test utterances are represented by frame-level ASM posteriorgrams. Segmental dynamic time warping (DTW) is applied to match the query with the test utterance and locate the possible occurrences. The performance of the proposed approach is evaluated with different DTW local distance measures on the TIMIT and the Fisher Corpora respectively. Experimental results show that the use of ASM posteriorgrams leads to consistently better performance of detection than the conventional GMM posteriorgrams.

...read moreread less

28 citations

Proceedings Article•DOI•

Filterbank learning using Convolutional Restricted Boltzmann Machine for speech recognition

[...]

Hardik B. Sailor¹, Hemant A. Patil¹•Institutions (1)

Dhirubhai Ambani Institute of Information and Communication Technology¹

31 Mar 2016

TL;DR: The developed ConvRBM with sampling from noisy rectified linear units (NReLUs) is trained in an unsupervised way to model speech signal of arbitrary lengths and weights of the model can represent an auditory-like filterbank.

...read moreread less

Abstract: Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our proposed learned filterbank is also nonlinear with respect to center frequencies of subband filters similar to standard filterbanks (such as Mel, Bark, ERB, etc.). We have used our proposed model as a front-end to learn features and applied to speech recognition task. Performance of ConvRBM features is improved compared to MFCC with relative improvement of 5% on TIMIT test set and 7% on WSJ0 database for both Nov'92 test sets using GMM-HMM systems. With DNN-HMM systems, we achieved relative improvement of 3% on TIMIT test set over MFCC and Mel filterbank (FBANK). On WSJ0 Nov'92 test sets, we achieved relative improvement of 4–14% using ConvRBM features over MFCC features and 3.6–5.6% using ConvRBM filterbank over FBANK features.

...read moreread less

28 citations

Proceedings Article•DOI•

High-Accuracy Phone Recognition By Combining High-Performance Lattice Generation and Knowledge Based Rescoring

[...]

Sabato Marco Siniscalchi¹, Petr Schwarz², Chin-Hui Lee¹•Institutions (2)

Georgia Institute of Technology¹, Brno University of Technology²

15 Apr 2007

TL;DR: To integrate the two very different modules, Brno's phone recognizer is modified into a phone lattice hypothesizer to produce high-quality phone lattices, and feed them directly into the knowledge-based module to rescore the lattices.

...read moreread less

Abstract: This study is a result of a collaboration project between two groups, one from Brno University of Technology and the other from Georgia Institute of Technology (GT). Recently the Brno recognizer is known to outperform many state-of-the-art systems on phone recognition, while the GT knowledge-based lattice rescoring module has been shown to improve system performance on a number of speech recognition tasks. We believe a combination of the two system results in high-accuracy phone recognition. To integrate the two very different modules, we modify Brno's phone recognizer into a phone lattice hypothesizer to produce high-quality phone lattices, and feed them directly into the knowledge-based module to rescore the lattices. We test the combined system on the TIMIT continuous phone recognition task without retraining the individual subsystems, and we observe that the phone error rate was effectively reduced to 19.78% from 24.41% produced by the Brno phone recognizer. To the best of the authors' knowledge this result represents the lowest ever error rate reported on the TIMIT continuous phone recognition task.

...read moreread less

28 citations

Proceedings Article•DOI•

Joint time-frequency scattering for audio classification

[...]

Joakim Andén¹, Vincent Lostanlen², Stéphane Mallat²•Institutions (2)

Princeton University¹, École Normale Supérieure²

12 Nov 2015

TL;DR: It is shown that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and frequency modulated excitations on the TIMIT dataset.

...read moreread less

Abstract: We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to a time-frequency wavelet scalogram. We show that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and frequency modulated excitations. State-of-the-art results are achieved for signal reconstruction and phone segment classification on the TIMIT dataset.

...read moreread less

28 citations

Journal Article•DOI•

Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural Networks

[...]

Yang Sun¹, Wenwu Wang², Jonathon A. Chambers³, Syed Mohsen Naqvi¹•Institutions (3)

Newcastle University¹, University of Surrey², University of Leicester³

01 Jan 2019-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A two-stage approach with two DNN-based methods to address dereverberation and separation in the monaural source separation problem, which outperform the state-of-the-art specifically in highly reverberant room environments.

...read moreread less

Abstract: Deep neural networks DNNs have been used for dereverberation and separation in the monaural source separation problem. However, the performance of current state-of-the-art methods is limited, particularly when applied in highly reverberant room environments. In this paper, we propose a two-stage approach with two DNN-based methods to address this problem. In the first stage, the dereverberation of the speech mixture is achieved with the proposed dereverberation mask DM. In the second stage, the dereverberant speech mixture is separated with the ideal ratio mask IRM. To realize this two-stage approach, in the first DNN-based method, the DM is integrated with the IRM to generate the enhanced time-frequency T-F mask, namely the ideal enhanced mask IEM, as the training target for the single DNN. In the second DNN-based method, the DM and the IRM are predicted with two individual DNNs. The IEEE and the TIMIT corpora with real room impulse responses and noise from the NOISEX dataset are used to generate speech mixtures for evaluations. The proposed methods outperform the state-of-the-art specifically in highly reverberant room environments.

...read moreread less

28 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics