Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
08 Dec 2008TL;DR: Two methods to add CM into binary SVM outputs using trainable intelligent systems are described, the first method is the simulation of Platt method using neural network while the second method is a linear combination of Platts sigmoid functions using multi-layer perceptron.
Abstract: Although the recognition results of support vector machines are very promising in many applications, however there is a gap between the accuracy of SVM based speech recognizers and time series models (e.g. HMM). The main reason is the lack of reliable confidence measure (CM) in SVM outputs. This paper describes two methods to add CM into binary SVM outputs using trainable intelligent systems. The first method is the simulation of Platt method using neural network while the second method is a linear combination of Platt sigmoid functions using multi-layer perceptron. The results of experiments, arranged on a set of confused phonemes using TIMIT corpus, show that the second method demonstrates better performance than the first one, e.g. After rejecting 20% of classifications by CM, the achieved error rates for ldquo/p/,/t/rdquo, ldquo/p/,/q/rdquo and ldquo/t/,q/rdquo phonemes are 3.86%, 2.1% and 0.6% respectively, while this error rate is much higher without employing neural networks. Although by increasing the number of phonemes, the performance of the second method will match that of the first one.
2 citations
••
01 Aug 2016
TL;DR: A novel two-layer decision model based on noise classification to detect the activity voice robustly is proposed and experimental results show that the method outperforms global classifier, especially in low SNR condition.
Abstract: Generally, the performance of endpoint detection is affected by the noise. In this paper, we propose a novel two-layer decision model based on noise classification to detect the activity voice robustly. The training processing mainly contains two steps: firstly, we employ the noisex-92 database, which consists of different types of pure noise, to train a BP neural network in order to classify the noise type precisely, secondly, we train BP neural networks for each noise type covering large range of signal noise ratio (SNR). In the testing phase, we assume that the short period of silence at the beginning of the signal contains features for noise and utilize them to get the noise type. Then, we use the classifier corresponding to the noise type to detect activity voice. We conduct experiments on TIMIT corpus for 5 noise types under 7 SNR conditions. And experimental results show that our method outperforms global classifier, especially in low SNR condition.
2 citations
••
07 Mar 2000TL;DR: Simulation results for classifying the utterances show that the size of the BDRNN required is very small compared to multilayer perceptron networks with time delayed feedforward connections.
Abstract: The objective of this paper is to recognize speech based on speech prediction techniques using a discrete time recurrent neural network (DTRNN) with a block diagonal feedback weight matrix called the block diagonal recurrent neural network (BDRNN). The ability of this network has been investigated for the TIMIT isolated digits spoken by a representative speaker. Simulation results for classifying the utterances show that the size of the BDRNN required is very small compared to multilayer perceptron networks with time delayed feedforward connections.
2 citations
••
TL;DR: This work proposes a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme and offers more robust noise- Invariant property than the conventional speech enhancement techniques.
Abstract: Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data are drawn from a generative process involving multiple entangled factors, disentangling the speech factor can encourage the trained model to result in better performance for speech enhancement. With the recent success in learning disentangled representation using neural networks, we explore a framework for disentangling speech and noise, which has not been exploited in the conventional speech enhancement algorithms. In this work, we propose a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme. To compare the performance of the proposed method with other conventional algorithms, we conducted experiments in both the matched and mismatched noise conditions using TIMIT and TSPspeech datasets. Experimental results show that our model successfully disentangles the speech and noise latent features. Consequently, the proposed model not only achieves better enhancement performance but also offers more robust noise-invariant property than the conventional speech enhancement techniques.
2 citations
••
15 Jun 2011TL;DR: This paper investigates improvements in phoneme classification and recognition using an ensemble of small size multi-layer perceptrons (MLPs) instead of a large monolithic MLP.
Abstract: In this paper we investigate improvements in phoneme classification and recognition using an ensemble of small size multi-layer perceptrons (MLPs) instead of a large monolithic MLP. The ensemble adopts different input context spans. It is trained using AdaBoost algorithm and output posteriors are combined according to two static and adaptive combination rules including weighting based on static classifier error and inverse entropy. The proposed method improves accuracy without increasing number of total connectionist weights. Experimental results on TIMIT corpus present promising improvements in phoneme classification and recognition rates.
2 citations