scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
08 Dec 2008
TL;DR: Two methods to add CM into binary SVM outputs using trainable intelligent systems are described, the first method is the simulation of Platt method using neural network while the second method is a linear combination of Platts sigmoid functions using multi-layer perceptron.
Abstract: Although the recognition results of support vector machines are very promising in many applications, however there is a gap between the accuracy of SVM based speech recognizers and time series models (e.g. HMM). The main reason is the lack of reliable confidence measure (CM) in SVM outputs. This paper describes two methods to add CM into binary SVM outputs using trainable intelligent systems. The first method is the simulation of Platt method using neural network while the second method is a linear combination of Platt sigmoid functions using multi-layer perceptron. The results of experiments, arranged on a set of confused phonemes using TIMIT corpus, show that the second method demonstrates better performance than the first one, e.g. After rejecting 20% of classifications by CM, the achieved error rates for ldquo/p/,/t/rdquo, ldquo/p/,/q/rdquo and ldquo/t/,q/rdquo phonemes are 3.86%, 2.1% and 0.6% respectively, while this error rate is much higher without employing neural networks. Although by increasing the number of phonemes, the performance of the second method will match that of the first one.

2 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: A novel two-layer decision model based on noise classification to detect the activity voice robustly is proposed and experimental results show that the method outperforms global classifier, especially in low SNR condition.
Abstract: Generally, the performance of endpoint detection is affected by the noise. In this paper, we propose a novel two-layer decision model based on noise classification to detect the activity voice robustly. The training processing mainly contains two steps: firstly, we employ the noisex-92 database, which consists of different types of pure noise, to train a BP neural network in order to classify the noise type precisely, secondly, we train BP neural networks for each noise type covering large range of signal noise ratio (SNR). In the testing phase, we assume that the short period of silence at the beginning of the signal contains features for noise and utilize them to get the noise type. Then, we use the classifier corresponding to the noise type to detect activity voice. We conduct experiments on TIMIT corpus for 5 noise types under 7 SNR conditions. And experimental results show that our method outperforms global classifier, especially in low SNR condition.

2 citations

Proceedings ArticleDOI
07 Mar 2000
TL;DR: Simulation results for classifying the utterances show that the size of the BDRNN required is very small compared to multilayer perceptron networks with time delayed feedforward connections.
Abstract: The objective of this paper is to recognize speech based on speech prediction techniques using a discrete time recurrent neural network (DTRNN) with a block diagonal feedback weight matrix called the block diagonal recurrent neural network (BDRNN). The ability of this network has been investigated for the TIMIT isolated digits spoken by a representative speaker. Simulation results for classifying the utterances show that the size of the BDRNN required is very small compared to multilayer perceptron networks with time delayed feedforward connections.

2 citations

Journal ArticleDOI
TL;DR: This work proposes a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme and offers more robust noise- Invariant property than the conventional speech enhancement techniques.
Abstract: Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data are drawn from a generative process involving multiple entangled factors, disentangling the speech factor can encourage the trained model to result in better performance for speech enhancement. With the recent success in learning disentangled representation using neural networks, we explore a framework for disentangling speech and noise, which has not been exploited in the conventional speech enhancement algorithms. In this work, we propose a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme. To compare the performance of the proposed method with other conventional algorithms, we conducted experiments in both the matched and mismatched noise conditions using TIMIT and TSPspeech datasets. Experimental results show that our model successfully disentangles the speech and noise latent features. Consequently, the proposed model not only achieves better enhancement performance but also offers more robust noise-invariant property than the conventional speech enhancement techniques.

2 citations

Proceedings ArticleDOI
15 Jun 2011
TL;DR: This paper investigates improvements in phoneme classification and recognition using an ensemble of small size multi-layer perceptrons (MLPs) instead of a large monolithic MLP.
Abstract: In this paper we investigate improvements in phoneme classification and recognition using an ensemble of small size multi-layer perceptrons (MLPs) instead of a large monolithic MLP. The ensemble adopts different input context spans. It is trained using AdaBoost algorithm and output posteriors are combined according to two static and adaptive combination rules including weighting based on static classifier error and inverse entropy. The proposed method improves accuracy without increasing number of total connectionist weights. Experimental results on TIMIT corpus present promising improvements in phoneme classification and recognition rates.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895