Home
/
Topics
/
TIMIT

Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Improved feature vectors using N-to-1 Gaussian MFCC transformation for automatic speech recognition system

[...]

Othman Lachhab¹, El hassan Ibn El Haj•Institutions (1)

Mohammed V University¹

01 Sep 2016

TL;DR: The proposed feature transformation, improves the phone recognition accuracy when compared with classical methods using conventional cepstral feature vectors in the context of using HMMs with a number of Gaussians less than 16 by state.

...read moreread less

Abstract: In this paper, we propose a novel vector transformation projecting the feature vectors in a new space, characterized by good discriminant properties, while reducing drastically the number of parameters used in the ASR systems. We call this method “N-to-1 Gaussian MFCC transformation”. It uses the HMM acoustic parameters obtained by N and 1 Gaussian in the training process in order to calculate the transformed vectors in the new projection space. Our transformation technique permits an important reduction of the number of Gaussians (in the GMM modeling of the emission probability of each state) while improving the performances of ASR systems. Our experimental results using both TIMIT and FPSD corpus demonstrate that the proposed feature transformation, improves the phone recognition accuracy when compared with classical methods using conventional cepstral feature vectors in the context of using HMMs with a number of Gaussians less than 16 by state.

...read moreread less

Proceedings Article•DOI•

Speech Enhancement Using NMF based on Hierarchical Deep Neural Networks with Joint Learning

[...]

Mohammad Mahdi Mijialili¹, Sanaz Sevedin¹•Institutions (1)

Amirkabir University of Technology¹

04 Aug 2020

TL;DR: A novel method which includes autoencoder and deep neural networks (DNNs) in a hierarchal structure for speech enhancement and shows a significant improvement in both seen and unseen noises compared to baselines is proposed.

...read moreread less

Abstract: In this paper, we propose a novel method which includes autoencoder and deep neural networks (DNNs) in a hierarchal structure for speech enhancement. In this method, at first, two parallel autoencoders are employed to obtain the nonnegative matrix factorization (NMF) parameters of speech and noise in a nonlinear mapping. After that, by using the spectrum of noisy speech as the input of the encoder portion of the autoencoders, the outputs of the encoders are calculated and utilized as the input of DNNs in the next hierarchies to further enhance the speech spectrum more efficiently. Also, the last three hierarchies including the decoder portion of autoencoders and the DNN will be trained in a joint learning scenario to improve the results. The proposed method is evaluated on TIMIT corpus with perceptual-evaluation-of-speech-quality (PESQ) and frequency-weighted-segmental-signal-to-noise-ratio (fwSNRseg) criterions. The obtained results show a significant improvement in both seen and unseen noises compared to baselines.

...read moreread less

Book Chapter•DOI•

Phonetic question generation using misrecognition

[...]

Supphanat Kanokphara¹, Julie Carson-Berndsen¹•Institutions (1)

University College Dublin¹

11 Sep 2006

TL;DR: This paper is concerned with an alternative method for generating these phonetic questions automatically from misrecognition items and these questions are tested using the standard TIMIT phone recognition task.

...read moreread less

Abstract: Most automatic speech recognition systems are currently based on tied state triphones These tied states are usually determined by a decision tree Decision trees can automatically cluster triphone states into many classes according to data available allowing each class to be trained efficiently In order to achieve higher accuracy, this clustering is constrained by manually generated phonetic questions Moreover, the tree generated from these phonetic questions can be used to synthesize unseen triphones The quality of decision trees therefore depends on the quality of the phonetic questions Unfortunately, manual creation of phonetic questions requires a lot of time and resources To overcome this problem, this paper is concerned with an alternative method for generating these phonetic questions automatically from misrecognition items These questions are tested using the standard TIMIT phone recognition task.

...read moreread less

Journal Article•DOI•

Power Normalized Gammachirp Cepstral (PNGC) coefficients-based approach for robust speaker recognition

[...]

Youssef Zouhir, Mohamed El Zarka, Kais Ouni

01 Mar 2023-Applied Acoustics

TL;DR: In this paper , a new feature extraction approach for robust speaker recognition named Power Normalized Gammachirp Cepstral (PNGC) was introduced, which uses a biologically motivated auditory perceptual model.

...read moreread less

Proceedings Article•DOI•

Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement

[...]

11 Dec 2022

TL;DR: In this article , a multi-task joint learning scheme was proposed to improve embedding aware audio-visual speech enhancement by adopting the phone and the articulation place together as the classification targets during the training of embedding extractor and enhancement network.

...read moreread less

Abstract: In this paper, we propose a multi-task joint learning scheme to improve embedding aware audio-visual speech enhancement by adopting the phone and the articulation place together as the classification targets during the training of embedding extractor and enhancement network. Firstly, the multimodal embedding is extracted from noisy speech and lip frames, and supervised by the articulation place and the phone label levels together. Next, we train the embedding extractor and enhancement network jointly where the learning objects include the ideal ratio mask, the phone posteriori and the place posteriori. Experiments on the TCD-TIMIT corpus corrupted by simulated additive noises show that the proposed multimodal embedding at the multi-scale class level is more effective than the previous embedding at the place/phone level and the multi-task based joint learning framework further improves speech quality and intelligibility.

...read moreread less

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics