scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
04 May 2014
TL;DR: Experiments show that the proposed TNMF-based methods outperform traditional NMF- based methods for separating the monophonic mixtures of speech signals of known speakers.
Abstract: Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence of mixture signals. In this paper, we propose a new transductive NMF algorithm (TNMF) to jointly learn a dictionary on both speech signals of each speaker and the mixture signals to be separated. Since TNMF learns a more descriptive dictionary by encoding the mixture signals than that learned by NMF, it significantly boosts the separation performance. Experiments results on a popular TIMIT dataset show that the proposed TNMF-based methods outperform traditional NMF-based methods for separating the monophonic mixtures of speech signals of known speakers.

13 citations

Proceedings ArticleDOI
13 Dec 2010
TL;DR: An effective algorithm for classification of one group of phonemes, namely the unvoiced fricatives, which are characterized by a relatively large amount of spectral energy in the high frequency range is presented.
Abstract: Classification of phonemes is the process of assigning a phonetic category to a short section of speech signal. It is a key stage in various applications such as Spoken Term Detection, continuous speech recognition and music to lyrics synchronization, but it can also be useful on its own, for example in the professional music industry, and for applications for the hearing impaired. In this study we present an effective algorithm for classification of one group of phonemes, namely the unvoiced fricatives, which are characterized by a relatively large amount of spectral energy in the high frequency range. The classification between individual phonemes within this group is fairly difficult due to the fact that their acoustic-phonetic characteristics are quite similar. A three-stage classification algorithm between the unvoiced fricatives is utilized. In the first, preprocessing stage, each phoneme segment is divided into consecutive non-overlapping short windowed frames, which is represented by a 15-dimensional feature vector. In the second stage a support vector machine (SVM) is trained, using radial basis kernel function and an automatic grid search for optimizing the SVM parameter. A tree-based algorithm is used in the classification stage, where the phonemes are first classified into two subgroups according to their articulation: sibilants (/s/ and /sh/) and the nonsibilants (/f/ and /th/). Each subgroup is further classified using another SVM. For the evaluation of the performance of the algorithm we used more than 11000 phonemes extracted from the TIMIT speech database. Using a majority vote for the feature vectors of the-same phoneme, the overall accuracy of 85% is obtained (91% for the subset /s/, /sh/ and /f/). These results are comparable and somewhat better than those achieved in other studies. The efficiency and robustness of the algorithm make it implementable in real time applications for the hearing impaired or in recording studios.

13 citations

Posted Content
TL;DR: In this article, an unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural networks is proposed to predict speech features frame-by-frame by analyzing the error profile of a model.
Abstract: Phonemic segmentation of speech is a critical step of speech recognition systems We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error We evaluate our system on the TIMIT dataset, with improvements over similar methods

13 citations

Posted Content
TL;DR: It is shown that a large improvement in the accuracy of deep speech models can be achieved with effective Neural Architecture Optimization at a very low computational cost.
Abstract: Deep neural networks (DNNs) have been demonstrated to outperform many traditional machine learning algorithms in Automatic Speech Recognition (ASR). In this paper, we show that a large improvement in the accuracy of deep speech models can be achieved with effective Neural Architecture Optimization at a very low computational cost. Phone recognition tests with the popular LibriSpeech and TIMIT benchmarks proved this fact by displaying the ability to discover and train novel candidate models within a few hours (less than a day) many times faster than the attention-based seq2seq models. Our method achieves test error of 7% Word Error Rate (WER) on the LibriSpeech corpus and 13% Phone Error Rate (PER) on the TIMIT corpus, on par with state-of-the-art results.

13 citations

Proceedings ArticleDOI
01 May 2017
TL;DR: Four systems based on different speech features were combined in score-level to improve verification accuracy under clean and noisy speech conditions and this reduces the equal error rates is in some cases up to 44%.
Abstract: So far, many methods have been proposed for speaker verification which provide good results, but their performances reduce in actual noisy environments. A common approach to partially alleviate this problem is the fusion of several methods. In this paper, four systems based on different speech features, i.e., MFCC, IMFCC, LFCC, and PNCC were combined in score-level to improve verification accuracy under clean and noisy speech conditions. The features pairwise and foursome fusion in a speaker verification system based on speaker modeling through the Gaussian mixture model (GMM) were evaluated. TIMIT and NOISEX92 databases were used to implement as the speech and noise datasets, respectively. The experimental results show that the score-level fusion of different feature vectors enhances the accuracy of speaker verification system and this reduces the equal error rates is in some cases up to 44%.

13 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895