Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Detection of Operation Type and Order for Digital Speech

[...]

Tingting Wu¹, Diqun Yan², Diqun Yan¹, Li Xiang¹, Rangding Wang¹ - Show less +1 more•Institutions (2)

Ningbo University¹, Shenzhen University²

01 Jan 2020

TL;DR: This study proposes a universal forensic algorithm that can detect four typical speech operations: pitch shifting, noise-adding, low- pass filtering, and high-pass filtering and demonstrates the effectiveness of the proposed algorithm in terms of robustness against the MP3 compression attack.

...read moreread less

Abstract: Most existing speech forensic works implicitly assume the suspected speech either has or has not been processed by a specific operation. In practice, however, the operation type performed on the target speech is usually unknown to the forensic analyst, and in most cases, multiple operations may be involved in order to conceal the forgery trace. Few works have considered these issues. In this study, we propose a universal forensic algorithm that can detect four typical speech operations: pitch shifting, noise-adding, low-pass filtering, and high-pass filtering. The motivation of the proposed algorithm is based on the observation that different operations will cause different effects on Mel-frequency cepstral coefficients (MFCC). The statistical moments of MFCC are extracted as detecting features. Additionally, cepstral mean and variance normalization (CMVN), which is a computationally efficient normalization technique, is used to eliminate the impact of channel noise. Finally, an ensembled binary classifier is used to detect the type of various operations, and multiclass classifiers are adopted to identify the order of operations. The experimental results on the TIMIT and UME-ERJ datasets show that the proposed forensic features achieve good performance on the operation type and order detection. Additionally, the results demonstrate the effectiveness of the proposed algorithm in terms of robustness against the MP3 compression attack.

...read moreread less

4 citations

Proceedings Article•

Optimizing the parameters of decoding graphs using new log-based MCE

[...]

Abdelaziz A. Abdelhamid¹, Waleed H. Abdulla¹•Institutions (1)

University of Auckland¹

01 Dec 2012

TL;DR: The experimental results show that the proposed method outperformed the baseline system based on both the maximum likelihood estimation (MLE) and sigmoid-based MCE and achieved a reduction in the word error rate (WER) of 28.9% when tested on the TIMIT speech database.

...read moreread less

Abstract: This paper proposes a new class loss function as an alternative to the standard sigmoid class loss function for optimizing the parameters of decoding graphs using discriminative training based on minimum classification error (MCE) criterion. The standard sigmoid based approach tends to ignore a significant number of training samples that have a large difference between the scores of the reference and their corresponding competing hypotheses and this affects the parameters optimization. The proposed function overcomes this limitation through considering almost all the training samples and thus improved the parameter optimization when tested on large decoding graphs. The decoding graph used in this research is an integrated network of weighted finite state transducers. The primary task examined is 64K words, continuous speech recognition task. The experimental results show that the proposed method outperformed the baseline system based on both the maximum likelihood estimation (MLE) and sigmoid-based MCE and achieved a reduction in the word error rate (WER) of 28.9% when tested on the TIMIT speech database.

...read moreread less

4 citations

Proceedings Article•DOI•

Is AdaBoost competitive for phoneme classification

[...]

Gábor Gosztolya

01 Nov 2014

TL;DR: AdaBoost.MH can achieve an accuracy comparable to standard ANNs in this task, but lags behind recently-proposed Deep Neural Networks.

...read moreread less

Abstract: In the phoneme classification task of speech recog-nition, usually Gaussian Mixture Models and Artificial Neural Networks are used. For other machine learning tasks, however, several other classification algorithms are also applied. One of them is AdaBoost.MH, reported to have high accuracy, which we tested for phoneme recognition on the well-known TIMIT dataset. We found that it can achieve an accuracy comparable to standard ANNs in this task, but lags behind recently-proposed Deep Neural Networks. Based on our experimental results, we list a number of possible reasons why this might be so.

...read moreread less

4 citations

Proceedings Article•DOI•

Distinct Triphone Acoustic Modeling Using Deep Neural Networks

[...]

Dongpeng Chen¹, Brian Mak¹•Institutions (1)

Hong Kong University of Science and Technology¹

06 Sep 2015

TL;DR: Different triphone modeling under the state-of-the-art deep neural network (DNN) framework is investigated and the RMW approach is applied to linearly combine the neural network weight vectors of member triphones of each tied-state before the output softmax activation for each distinct triphone state.

...read moreread less

Abstract: To strike a balance between robust parameter estimation and detailed modeling, most automatic speech recognition systems are built using tied-state continuous density hidden Markov models (CDHMM) Consequently, states that are tied together in a tied-state are not distinguishable, introducing quantization errors inevitably It has been shown that it is possible to model (almost) all distinct triphones effectively by using a basis approach; previously two methods were proposed: eigentriphone modeling and reference model weighting (RMW) in CDHMM using Gaussian-mixture states In this paper, we investigate distinct triphone modeling under the state-of-the-art deep neural network (DNN) framework Due to the large number of DNN model parameters, regularization is necessary Multi-task learning (MTL) is first used to train distinct triphone states together with carefully chosen related tasks which serve as a regularizer The RMW approach is then applied to linearly combine the neural network weight vectors of member triphones of each tied-state before the output softmax activation for each distinct triphone state The method successfully improves phoneme recognition in TIMIT and word recognition in the Wall Street Journal task

...read moreread less

4 citations

Proceedings Article•

High performance text-independent speaker recognition system based on voiced/unvoiced segmentation and multiple neural nets.

[...]

Nikos Fakotakis¹, John Sirigos, George Kokkinakis•Institutions (1)

University of Patras¹

01 Jan 1999

TL;DR: This paper presents a text-independent speaker recognition system based on the voiced segments of the speech signal that uses feedforward MLP classification with only a limited amount of training and testing data and gives a comparatively high accuracy.

...read moreread less

Abstract: This paper presents a text-independent speaker recognition system based on the voiced segments of the speech signal. The proposed system uses feedforward MLP classification with only a limited amount of training and testing data and gives a comparatively high accuracy. The techniques employed are: the Rasta-PLP speech analysis for parameter estimation, a feedforward MLP for voiced/unvoiced segmentation and a large number (equal to the number of speakers) of simple MLPs for the classification procedure. The system has been trained and tested using TIMIT and NTIMIT databases. The verification experiments presented a high accuracy rate: above 99% for clean speech (TIMIT) and 74.7%, for noisy speech (NTIMIT). Additional experiments were performed comparing the proposed approach of using voiced segments with only vowels and all phonetic categories with results favorable to the use of voiced segments.

...read moreread less

3 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics