Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Dissertation•

Performance Enhancement of Automatic Speech Recognition (ASR) Using Robust Wavelet-Based Feature Extraction Techniques

[...]

Astik Biswas

01 Jan 2016

TL;DR: This thesis is to enhance the robustness of the system in the complex auditory environment by developing new front-end acoustic feature extraction techniques by using the one-way Analysis Of Variance (ANOVA}-based camera fusion strategy.

...read moreread less

Abstract: In this era of smart applications, Automatic Speech Recognition (ASR) has established itself as an emerging technology that is becoming popular day by day. However, the accuracy and reliability of these systems are somehow restricted by the acoustic conditions such as background noise, and channel noise. Thus, there is a considerable gap between human-machine communications, due to their lack of robustness in the composite auditory scene. The objective of this thesis is to enhance the robustness of the system in the complex auditory environment by developing new front-end acoustic feature extraction techniques. Pros and cons of the different techniques are also highlighted. In the recent years, wavelet based acoustic features have been popular for speech recognition applications. The wavelet transform is an excellent tool for the time-frequency analysis with good signal denoising property. A new auditory-based Wavelet Packet (WP) features are proposed to enhance the system performance across different types of noisy conditions. The design and development of the proposed technique is carried out in such a way that it mimics the frequency response of human ear according to the Equivalent Rectangular Bandwidth (ERB) scale. In the subsequent chapters, the further developments of the proposed technique are discussed by using the Sub-band based Periodicity and Aperiodicity Decomposition (SPADE) and harmonic analysis. The TIMIT (English) and CSIR-TIFR (Hindi) phoneme recognition tasks are carried out to evaluate the performance of proposed technique. The simulation results demonstrate the potentiality of proposed techniques to enhance the system accuracy in a wide range of SNR. Further, visual modality plays a vital role in computer vision systems when the acoustic modality is disturbed by the background noise. However, most of the systems rarely addressed the visual domain problems, to make it work in real world conditions. Multiple-camera protocol ensures more flexibility to the system by allowing speakers to move freely. In the last chapter, consideration is given to Audio-Visual Speech Recognition (AVSR) implementation in vehicular environments, which resulted in one novel contribution-the one-way Analysis Of Variance (ANOVA}-based camera fusion strategy. Multiple-camera fusion technique is an imperative part of multiple cameras computer vision applications. The ANOVA-based approach is proposed to study the relative contribution of each camera for AVSR experiments in-vehicle environments. The four-cam automotive audio-visual corpus is used to investigate the performance of the proposed technique. Speech is a primary medium of communication for humans, and various speech-based applications can work reliably only by improving the performance of ASR across different environments. In the modern era, there is a vast potential and immense possibility of using speech effectively as a communication medium between human and machine. The robust and reliable speech technology ensures people to experience the full benefits of Information and Communication Technology (lCT).

...read moreread less

3 citations

Proceedings Article•DOI•

Large population speaker recognition using wideband and telephone speech

[...]

Douglas A. Reynolds¹•Institutions (1)

Massachusetts Institute of Technology¹

25 Oct 1994

TL;DR: In this paper, a Gaussian mixture speaker model was used for speaker identification and experiments were conducted on the TIMIT and NTIMIT databases, achieving accuracies of 99.5% and 60.7% for clean, wideband speech and telephone speech, respectively.

...read moreread less

Abstract: The two largest factors affecting automatic speaker identification performance are the size of the population to be distinguished among and the degradations introduced by noisy communication channels (e.g., telephone transmission). To experimentally examine these two factors, this paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech and telephone speech. A system based on Gaussian mixture speaker models is used for speaker identification and experiments are conducted on the TIMIT and NTIMIT databases. The aims of this study are to (1) establish how well text-independent speaker identification can perform under near ideal conditions for very large populations (using the TIMIT database), (2) gauge the performance loss incurred by transmitting the speech over the telephone network (using the NTIMIT database), and (3) examine the validity of current models of telephone degradations commonly used in developing compensation techniques (using the NTIMIT calibration signals). This is believed to be the first speaker identification experiments on the complete 630 speaker TIMIT and NTIMIT databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 99.5% and 60.7% are achieved on the TIMIT and NTIMIT databases, respectively.© (1994) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

3 citations

Accelerating Vector Quantization Based Speaker Identification

[...]

Muhammad Afzal, Shaiq A. Haq

01 Jan 2010

TL;DR: The speedup technique used in this paper partially prunes VQ codebook mean vectors using partial distortion elimination (PDE) for accelerating the bottlenecking component of speaker identification.

...read moreread less

Abstract: Matching of feature vectors extracted from speech sample of an unknown speaker, with models of registered speakers is the most time consuming component of real0time speaker identification systems. Time controlling parameters are size and count of extracted test feature vectors as well as size, complexity and count of models of registered speakers. We studied vector quantization (VQ) for accelerating the bottlenecking component of speaker identification which is less investigated than Gaussian mixture model (GMM). Already reported acceleration techniques in VQ approach reduce test feature vector count by pre0quantization and reduce candidate registered speakers by pruning unlikely ones, thereby, introducing risk of accuracy degradation. The speedup technique used in this paper partially prunes VQ codebook mean vectors using partial distortion elimination (PDE). Acceleration factor of up to 3.29 on 630 registered speakers of TIMIT 8kHz speech data and 4 on 91 registered speakers of CSLU speech data is achieved respectively.

...read moreread less

3 citations

Book Chapter•DOI•

Semi-supervised Phoneme Recognition with Recurrent Ladder Networks

[...]

Marian Tietz¹, Tayfun Alpay¹, Johannes Twiefel¹, Stefan Wermter¹•Institutions (1)

University of Hamburg¹

11 Sep 2017

TL;DR: In this paper, a novel modification of the ladder network was proposed for semi-supervised learning of recurrent neural networks, which was evaluated with a phoneme recognition task on the TIMIT corpus.

...read moreread less

Abstract: Ladder networks are a notable new concept in the field of semi-supervised learning by showing state-of-the-art results in image recognition tasks while being compatible with many existing neural architectures. We present the recurrent ladder network, a novel modification of the ladder network, for semi-supervised learning of recurrent neural networks which we evaluate with a phoneme recognition task on the TIMIT corpus. Our results show that the model is able to consistently outperform the baseline and achieve fully-supervised baseline performance with only 75% of all labels which demonstrates that the model is capable of using unsupervised data as an effective regulariser.

...read moreread less

3 citations

Proceedings Article•DOI•

Deep Neural Networks for Voice Activity Detection

[...]

Serban Mihalache¹, Ioan-Alexandru Ivanov¹, Dragos Burileanu¹•Institutions (1)

Politehnica University of Bucharest¹

26 Jul 2021

TL;DR: In this paper, a deep neural network (DNN) system was proposed for the automatic detection of speech in audio signals, otherwise known as voice activity detection (VAD), with the best performance being obtained for the latter.

...read moreread less

Abstract: In this paper, we propose a deep neural network (DNN) system for the automatic detection of speech in audio signals, otherwise known as voice activity detection (VAD). Several DNN types were investigated, including multilayer perceptrons (MLPs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs), with the best performance being obtained for the latter. Additional postprocessing techniques, i.e., hysteresis thresholds, minimum duration filtering, and bilateral extension, were employed in order to boost performance. The systems were trained and tested using several data subsets of the CENSREC-1-C database, with different simulated ambient noise conditions, and additional testing was done on a different CENSREC-1-C data subset containing actual ambient noise, as well as on a subset of the TIMIT database. An accuracy of up to 99.13% was obtained for the CENSREC-1-C datasets, and 97.60% for the TIMIT dataset.

...read moreread less

3 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics