Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Monaural Source Separation in Complex Domain With Long Short-Term Memory Neural Network

[...]

Yang Sun¹, Yang Xian¹, Wenwu Wang², Syed Mohsen Naqvi¹•Institutions (2)

Newcastle University¹, University of Surrey²

01 Apr 2019-IEEE Journal of Selected Topics in Signal Processing

TL;DR: The complex signal approximation (cSA), which is operated in the complex domain to utilize the phase information of the desired speech signal to improve the separation performance, is proposed.

...read moreread less

Abstract: In recent research, deep neural network (DNN) has been used to solve the monaural source separation problem. According to the training objectives, DNN-based monaural speech separation is categorized into three aspects, namely masking, mapping, and signal approximation based techniques. However, the performance of the traditional methods is not robust due to variations in real-world environments. Besides, in the vanilla DNN-based methods, the temporal information cannot be fully utilized. Therefore, in this paper, the long short-term memory (LSTM) neural network is applied to exploit the long-term speech contexts. Then, we propose the complex signal approximation (cSA), which is operated in the complex domain to utilize the phase information of the desired speech signal to improve the separation performance. The IEEE and the TIMIT corpora are used to generate mixtures with noise and speech interferences to evaluate the efficacy of the proposed method. The experimental results demonstrate the advantages of the proposed cSA-based LSTM recurrent neural network method in terms of different objective performance measures.

...read moreread less

29 citations

Journal Article•DOI•

Isolated word recognition by neural network models with cross-correlation coefficients for speech dynamics

[...]

Jian-Xiong Wu¹, Chorkin Chan²•Institutions (2)

University of Hong Kong¹, Shanghai Jiao Tong University²

01 Nov 1993-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents an artificial neural network (ANN) for speaker-independent isolated word speech recognition that is a multilayer perceptron (MLP) in concatenation and the architecture of these three subnets are described, and the associated adaptive learning algorithms are derived.

...read moreread less

Abstract: This paper presents an artificial neural network (ANN) for speaker-independent isolated word speech recognition. The network consists of three subnets in concatenation. The static information within one frame of speech signal is processed in the probabilistic mapping subnet that converts an input vector of acoustic features into a probability vector whose components are estimated probabilities of the feature vector belonging to the phonetic classes that constitute the words in the vocabulary. The dynamics capturing subnet computes the first-order cross correlation between the components of the probability vectors to serve as the discriminative feature derived from the interframe temporal information of the speech signal. These dynamic features are passed for decision-making to the classification subnet, which is a multilayer perceptron (MLP). The architecture of these three subnets are described, and the associated adaptive learning algorithms are derived. The recognition results for a subset of the DARPA TIMIT speech database are reported. The correct recognition rate of the proposed ANN system is 95.5%, whereas that of the best of continuous hidden Markov model (HMM)-based systems is only 91.0%. >

...read moreread less

29 citations

Proceedings Article•DOI•

Telephone speech recognition using neural networks and hidden Markov models

[...]

Dong-Suk Yuk¹, James L. Flanagan•Institutions (1)

Rutgers University¹

15 Mar 1999

TL;DR: Neural network based adaptation methods are applied to telephone speech recognition and a new unsupervised model adaptation method is proposed that does not require transcriptions and can be used with the neural networks.

...read moreread less

Abstract: The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of the transmission channels. In this paper, neural network based adaptation methods are applied to telephone speech recognition and a new unsupervised model adaptation method is proposed. The advantage of the neural network based approach is that the retraining of speech recognizers for telephone speech is avoided. Furthermore, because the multi-layer neural network is able to compute nonlinear functions, it can accommodate for the non-linear mapping between full bandwidth speech and telephone speech. The new unsupervised model adaptation method does not require transcriptions and can be used with the neural networks. Experimental results on TIMIT/NTIMIT corpora show that the performance of the proposed methods is comparable to that of recognizers retained on telephone speech.

...read moreread less

29 citations

Journal Article•DOI•

High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch

[...]

T. T. Wang¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this article, a 2D analysis framework using 2D transformations of the time-frequency space is proposed to obtain an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency.

...read moreread less

Abstract: This paper considers the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological studies implicating the use of pitch dynamics in speech by humans. We develop and assess signal processing schemes aimed at exploiting temporal change of pitch to address the high-pitch formant frequency estimation problem. Specifically, we propose a 2-D analysis framework using 2-D transformations of the time-frequency space. In one approach, we project changing spectral harmonics over time to a 1-D function of frequency. In a second approach, we draw upon previous work of Quatieri and Ezzat , , with similarities to the auditory modeling efforts of Chi , where localized 2-D Fourier transforms of the time-frequency space provide improved source-filter separation when pitch is changing. Our methods show quantitative improvements for synthesized vowels with stationary formant structure in comparison to traditional and homomorphic linear prediction. We also demonstrate the feasibility of applying our methods on stationary vowel regions of natural speech spoken by high-pitch females of the TIMIT corpus. Finally, we show improvements afforded by the proposed analysis framework in formant tracking on examples of stationary and time-varying formant structure.

...read moreread less

29 citations

An acoustic distance measure for automatic cross-language phoneme mapping

[...]

Jayren J. Sooful, Elizabeth C. Botha

01 Jan 2001

TL;DR: By selecting an appropriate distance measure, an automated procedure to map phonemes from a source language (English) to a target language (Afrikaans) can be applied, with recognition results comparable to a manual mapping process undertaken by a phonetic expert.

...read moreread less

Abstract: This paper explores an automated approach to mapping one phoneme set to another, based on the acoustic distances of the individual phonemes. The main goal of this investigation is to automate the technique for creating initial/baseline acoustic models for a new language. Using this technique, it would be possible to rapidly build speech recognition systems for a variety of languages. A subsidiary objective of this investigation is to compare different acoustic distance measures and to assess their ability to quantify the acoustic similarity between phonemes. The distance measures that were considered for this investigation are the Kullback-Leibler measure, the Bhattacharyya distance metric, the Mahalanobis measure, the Euclidean measure, the L2 metric and the Jeffreys-Matusita distance. Both the TIMIT and SUN Speech corpora were used. It was found that by selecting an appropriate distance measure, an automated procedure to map phonemes from a source language (English) to a target language (Afrikaans) can be applied, with recognition results comparable to a manual mapping process undertaken by a phonetic expert.

...read moreread less

29 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics