scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Book ChapterDOI
Tingting Wu1, Diqun Yan2, Diqun Yan1, Li Xiang1, Rangding Wang1 
01 Jan 2020
TL;DR: This study proposes a universal forensic algorithm that can detect four typical speech operations: pitch shifting, noise-adding, low- pass filtering, and high-pass filtering and demonstrates the effectiveness of the proposed algorithm in terms of robustness against the MP3 compression attack.
Abstract: Most existing speech forensic works implicitly assume the suspected speech either has or has not been processed by a specific operation. In practice, however, the operation type performed on the target speech is usually unknown to the forensic analyst, and in most cases, multiple operations may be involved in order to conceal the forgery trace. Few works have considered these issues. In this study, we propose a universal forensic algorithm that can detect four typical speech operations: pitch shifting, noise-adding, low-pass filtering, and high-pass filtering. The motivation of the proposed algorithm is based on the observation that different operations will cause different effects on Mel-frequency cepstral coefficients (MFCC). The statistical moments of MFCC are extracted as detecting features. Additionally, cepstral mean and variance normalization (CMVN), which is a computationally efficient normalization technique, is used to eliminate the impact of channel noise. Finally, an ensembled binary classifier is used to detect the type of various operations, and multiclass classifiers are adopted to identify the order of operations. The experimental results on the TIMIT and UME-ERJ datasets show that the proposed forensic features achieve good performance on the operation type and order detection. Additionally, the results demonstrate the effectiveness of the proposed algorithm in terms of robustness against the MP3 compression attack.

4 citations

Proceedings Article
01 Dec 2012
TL;DR: The experimental results show that the proposed method outperformed the baseline system based on both the maximum likelihood estimation (MLE) and sigmoid-based MCE and achieved a reduction in the word error rate (WER) of 28.9% when tested on the TIMIT speech database.
Abstract: This paper proposes a new class loss function as an alternative to the standard sigmoid class loss function for optimizing the parameters of decoding graphs using discriminative training based on minimum classification error (MCE) criterion. The standard sigmoid based approach tends to ignore a significant number of training samples that have a large difference between the scores of the reference and their corresponding competing hypotheses and this affects the parameters optimization. The proposed function overcomes this limitation through considering almost all the training samples and thus improved the parameter optimization when tested on large decoding graphs. The decoding graph used in this research is an integrated network of weighted finite state transducers. The primary task examined is 64K words, continuous speech recognition task. The experimental results show that the proposed method outperformed the baseline system based on both the maximum likelihood estimation (MLE) and sigmoid-based MCE and achieved a reduction in the word error rate (WER) of 28.9% when tested on the TIMIT speech database.

4 citations

Proceedings ArticleDOI
01 Nov 2014
TL;DR: AdaBoost.MH can achieve an accuracy comparable to standard ANNs in this task, but lags behind recently-proposed Deep Neural Networks.
Abstract: In the phoneme classification task of speech recog-nition, usually Gaussian Mixture Models and Artificial Neural Networks are used. For other machine learning tasks, however, several other classification algorithms are also applied. One of them is AdaBoost.MH, reported to have high accuracy, which we tested for phoneme recognition on the well-known TIMIT dataset. We found that it can achieve an accuracy comparable to standard ANNs in this task, but lags behind recently-proposed Deep Neural Networks. Based on our experimental results, we list a number of possible reasons why this might be so.

4 citations

Proceedings ArticleDOI
06 Sep 2015
TL;DR: Different triphone modeling under the state-of-the-art deep neural network (DNN) framework is investigated and the RMW approach is applied to linearly combine the neural network weight vectors of member triphones of each tied-state before the output softmax activation for each distinct triphone state.
Abstract: To strike a balance between robust parameter estimation and detailed modeling, most automatic speech recognition systems are built using tied-state continuous density hidden Markov models (CDHMM) Consequently, states that are tied together in a tied-state are not distinguishable, introducing quantization errors inevitably It has been shown that it is possible to model (almost) all distinct triphones effectively by using a basis approach; previously two methods were proposed: eigentriphone modeling and reference model weighting (RMW) in CDHMM using Gaussian-mixture states In this paper, we investigate distinct triphone modeling under the state-of-the-art deep neural network (DNN) framework Due to the large number of DNN model parameters, regularization is necessary Multi-task learning (MTL) is first used to train distinct triphone states together with carefully chosen related tasks which serve as a regularizer The RMW approach is then applied to linearly combine the neural network weight vectors of member triphones of each tied-state before the output softmax activation for each distinct triphone state The method successfully improves phoneme recognition in TIMIT and word recognition in the Wall Street Journal task

4 citations

Proceedings Article
01 Jan 1999
TL;DR: This paper presents a text-independent speaker recognition system based on the voiced segments of the speech signal that uses feedforward MLP classification with only a limited amount of training and testing data and gives a comparatively high accuracy.
Abstract: This paper presents a text-independent speaker recognition system based on the voiced segments of the speech signal. The proposed system uses feedforward MLP classification with only a limited amount of training and testing data and gives a comparatively high accuracy. The techniques employed are: the Rasta-PLP speech analysis for parameter estimation, a feedforward MLP for voiced/unvoiced segmentation and a large number (equal to the number of speakers) of simple MLPs for the classification procedure. The system has been trained and tested using TIMIT and NTIMIT databases. The verification experiments presented a high accuracy rate: above 99% for clean speech (TIMIT) and 74.7%, for noisy speech (NTIMIT). Additional experiments were performed comparing the proposed approach of using voiced segments with only vowels and all phonetic categories with results favorable to the use of voiced segments.

3 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895