Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
22 Sep 2008TL;DR: A set of novel duration features for detecting pitch accent and phrase boundaries, which depend on articulatory timing rather than segmental duration information, are presented.
Abstract: This paper presents a set of novel duration features for detecting pitch accent and phrase boundaries, which depend on articulatory timing rather than segmental duration information. The features are computed from the detected syllable nuclei and boundaries, using peaks and valleys in an energy contour but also leveraging information from a simple HMM phone manner class recognizer to increase recall. In experiments on the hand-segmented TIMIT corpus, we obtain greater than 90% Fmeasure for vowel detection. In prosody detection experiments on the BU Radio News corpus, comparing to a segmental feature baseline, we obtain similar performance for pitch accent detection and slightly worse boundary detection from the new features without the need for phonetic alignments.
3 citations
••
01 Nov 2018TL;DR: In the task of the unsupervised query by example spoken term detection (QbE-STD), concatenate the features extracted by a Self-Organizing Map (SOM) and features learned by an un supervised GMM based model at the feature level to enhance the performance.
Abstract: In the task of the unsupervised query by example spoken term detection (QbE-STD), we concatenate the features extracted by a Self-Organizing Map (SOM) and features learned by an unsupervised GMM based model at the feature level to enhance the performance. More specifically, The SOM features are represented by the distances between the current feature vector and the weight vectors of SOM neurons learned in an unsupervised manner. After fetching these features, we apply sub-sequence Dynamic Time Warping (S-DTW) to detect the occurrences of keywords in the test data. We evaluate the performance of these features on the TIMIT English database. After concatenating the SOM features and the GMM based features together, we achieve an improvement of 7.77% and 7.74% on Mean Average Precision (MAP) and P@10 on average.
3 citations
••
TL;DR: The experimental results show that this algorithm can filter noise from voice availably and improve the performance of automatic speech recognition system significantly and is proved to be robust under various noisy environments and Signal-to-Noise Ratio (SNR) conditions.
Abstract: As many traditional de-noising methods fail in the intensive noises environment and are unadaptable in various noisy environments, a method of speech enhancement has been advanced based on dynamic Fractional Fourier Transform (FRFT)filtering. The acoustic signals are framed. The renewing methods are put in FRFT optimal disperse degree of noising speech and this method is implemented in detail. By TIMIT criterion voice and Noisex-92, the experimental results show that this algorithm can filter noise from voice availably and improve the performance of automatic speech recognition system significantly. It is proved to be robust under various noisy environments and Signal-to-Noise Ratio (SNR) conditions. This algorithm is of low computational complexity and briefness in realization. http://dx.doi.org/10.11591/telkomnika.v12i12.6694
3 citations
••
12 Oct 1998TL;DR: The multinet phone classifier architecture is a framework for combining specialised phone detection networks into a posterior probability estimator for all phones and a standard mixture of Gaussian HMM classifiers is compared.
Abstract: The multinet phone classifier architecture is a framework for combining specialised phone detection networks into a posterior probability estimator for all phones In this paper we give results obtained for the architecture on TIMIT phone classification tasks We compare it with a standard mixture of Gaussian HMM classifiers
3 citations
•
TL;DR: In this paper, the authors combine the complex Gabor filter with complex-valued deep neural networks to replace usual CNN weights kernels, to fully take advantage of its optimal time-frequency resolution and of the complex domain.
Abstract: Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent researches propose to inject prior acoustic knowledge to the first convolutional layer by integrating the shape of the impulse responses in order to increase both the interpretability of the learnt acoustic model, and its performances. We propose to combine the complex Gabor filter with complex-valued deep neural networks to replace usual CNN weights kernels, to fully take advantage of its optimal time-frequency resolution and of the complex domain. The conducted experiments on the TIMIT phoneme recognition task shows that the proposed approach reaches top-of-the-line performances while remaining interpretable.
3 citations