scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: Compared with the state-of-the-art ones, PSRBL significantly reduces the time consumption on both the training and the recognition of the speech model under the premise thatPSRBL and the comparisons are consistent in the privacy-preserving of speech data.
Abstract: Utilizing speech as the transmission medium in Internet of things (IoTs) is an effective way to reduce latency while improving the efficiency of human-machine interaction. In the field of speech recognition, Recurrent Neural Network (RNN) has significant advantages to achieve accuracy improvement on speech recognition. However, some of RNN-based intelligence speech recognition applications are insufficient in the privacy-preserving of speech data, and others with privacy-preserving are time-consuming, especially about model training and speech recognition. Therefore, in this paper we propose a novel Privacy-preserving Speech Recognition framework using Bidirectional Long short-term memory neural network, namely PSRBL. On the one hand, PSRBL designs new functions to construct security activation functions by combing with an additive secret sharing protocol, namely a secure piecewise-linear Sigmoid and a secure piecewise-linear Tanh respectively, to achieve privacy-preserving of speech data during speech recognition process running on edge servers. On the other hand, in order to reduce the time spent on both the training and the recognition of the speech model while keeping high accuracy during speech recognition process, PSRBL first utilizes secure activation functions to refit original activation functions in the bidirectional Long Short-Term Memory neural network (LSTM), and then makes full use of the left and the right context information of speech data by employing bidirectional LSTM. Experiments conducted on the speech dataset TIMIT show that our framework PSRBL performs well. Specifically compared with the state-of-the-art ones, PSRBL significantly reduces the time consumption on both the training and the recognition of the speech model under the premise that PSRBL and the comparisons are consistent in the privacy-preserving of speech data.

15 citations

Journal ArticleDOI
TL;DR: It is found that there are cases where conventional VQ based system outperforms the modern systems and the impact of distance metrics on the performance of the conventional and modern systems depends on the recognition task imposed (verification/identification).

15 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this paper, a spike-based sound coding technique has been presented where the spikes are similar to the spikes exhibited by type 1 fibers of the auditory nerve, which can provide suitable input for a spiking neural network, as well as maintaining the accurate time structure of sound.
Abstract: A spike (event) based sound coding technique has been presented in this study where the spikes are similar to the spikes exhibited by type 1 fibers of the auditory nerve. This lossy coding technique has already been shown useful for inter-aural time difference based sound source direction finding. Here, we show that decoding and resynthesising this code can produce intelligible speech even using a small number of spike trains. We have used few composite techniques including speaker verification to assess the effectiveness of the coding technique on a large number of TIMIT sentences. This biologically inspired coding technique can provide suitable input for a spiking neural network, as well as maintaining the accurate time structure of sound.

15 citations

Proceedings ArticleDOI
01 Aug 2020
TL;DR: A novel method to extract the features from audio speech to recognize gender as male or female with the highest 96.8% accuracy for TIMIT Dataset with KNN comparing with the other two datasets.
Abstract: Nowadays the interaction between humans and machines is quite possible and friendly because of the speech recognition system. The gender identification system has been used in many fields like security systems, robotics, artificial intelligence, call center, etc. This paper narrates a novel method to extract the features from audio speech to recognize gender as male or female. At first, we have done data pre-processing to get the noise-free smooth data. Then used this pre-processed data in a multi-layer architecture model to extract the features. In the first layer, we have calculated fundamental frequency using autocorrelation function, spectral entropy, spectral flatness and mode frequency. In the second layer, we have used linear interpolation function to map the pre-processed data into a suitable range and used the Mel Frequency Cepstral Coefficient (MFCC) to extract the features from these mapped data. Three different datasets: TIMIT, RAVDESS, and BGC (Self-Created) and two machine learning classifiers: K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) have been used to substantiate the accuracy of the proposed model. We acquired the highest 96.8% accuracy for TIMIT Dataset with KNN comparing with the other two datasets.

15 citations

Proceedings ArticleDOI
19 Apr 1994
TL;DR: The phoneme class directed enhancement algorithm is evaluated using TIMIT speech data, and shown to result in substantial improvement in objective speech quality over a range of signal-to-noise ratios and individual phoneme classes.
Abstract: It is known that degrading acoustic noise influences speech quality across phoneme classes in a non-uniform manner. This results in variable quality performance for many speech enhancement algorithms in noisy environments. To address this, a hidden-Markov-mode phoneme classification procedure is proposed which directs single channel speech enhancement across individual phoneme classes. The procedure performs broad phoneme class partitioning of noisy speech frames using a continuous-mixture hidden-Markov-model recognizer in conjunction with a cost based decision process. Cost functions are assigned which weigh errors between phoneme classes that are perceptually different (e.g., vowels versus fricatives, etc.). Once noisy speech frames are partitioned, iterative speech enhancement based on all-pole parameter estimation with inter and intra-frame spectral constraints (Auto:I,LSP:T) is employed. The phoneme class directed enhancement algorithm is evaluated using TIMIT speech data, and shown to result in substantial improvement in objective speech quality over a range of signal-to-noise ratios and individual phoneme classes. The algorithm is also shown to possess consistent quality improvement in a speaker independent scenario. >

15 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895