scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Posted Content
TL;DR: In this article, the authors presented three new algorithms based on solutions for the maximum feasible subsystem problem (MAX FS) that improve on the state-of-the-art in recovery of compressed speech signals.
Abstract: The goal in signal compression is to reduce the size of the input signal without a significant loss in the quality of the recovered signal. One way to achieve this goal is to apply the principles of compressive sensing, but this has not been particularly successful for real-world signals that are insufficiently sparse, such as speech. We present three new algorithms based on solutions for the maximum feasible subsystem problem (MAX FS) that improve on the state of the art in recovery of compressed speech signals: more highly compressed signals can be successfully recovered with greater quality. The new recovery algorithms deliver sparser solutions when compared with those obtained using traditional compressive sensing recovery algorithms. When tested by recovering compressively sensed speech signals in the TIMIT speech database, the recovered speech has better perceptual quality than speech recovered using traditional compressive sensing recovery algorithms.
Proceedings Article
16 Jul 2006
TL;DR: This ongoing research project investigates articulatory feature (AF) classification using multiclass support vector machines (SVMs) to assess the AF classification performance of different multiclass generalizations of the SVM, including one-versus-rest, one-Versus-one, Decision Directed Acyclic Graph (DDAG), and direct methods for multiclass learning.
Abstract: This ongoing research project investigates articulatory feature (AF) classification using multiclass support vector machines (SVMs). SVMs are being constructed for each AF in the multi-valued feature set (Table 1), using speech data and annotation from the IFA Dutch “Open-Source” (van Son et al. 2001) and TIMIT English (Garofolo et al. 1993) corpora. The primary objective of this research is to assess the AF classification performance of different multiclass generalizations of the SVM, including one-versus-rest, one-versus-one, Decision Directed Acyclic Graph (DDAG), and direct methods for multiclass learning. Observing the successful application of SVMs to numerous classification problems (Bennett and Campbell 2000), it is hoped that multiclass SVMs will outperform existing state-of-the-art AF classifiers. One of the most basic challenges for speech recognition and other spoken language systems is to accurately map data from the acoustic domain into the linguistic domain. Much speech processing research has approached this task by taking advantage of the correlation between phones, the basic units of speech sound, and their acoustic manifestation (intuitively, there is a range of sounds that humans would consider to be an “e”). The mapping of acoustic data to phones has been largely successful, and is used in many speech systems today. Despite its success, there are drawbacks to using phones as the point of entry from the acoustic to linguistic domains. Notably, the granularity of the “phoneticsegmental” model, in which speech is represented as a series of phones, makes it difficult to account for various subphone phenomena that affect performance on spontaneous speech. Researchers have pursued an alternative approach to the acoustic-linguistic mapping through the use of articulatory modeling. This approach more directly exploits the intimate relation between articulation and acoustics: the state of one’s speech articulators (e.g. vocal folds, tongue) uniquely determines the parameters of the acoustic speech signal. Unfortunately, while the mapping from articulator to acoustics is straightforward, the problem of recovering the state of the articulators from an acoustic speech representation, acoustic-to-articulatory inversion, poses a formidable challenge (Toutios and Margaritis 2003). Nevertheless, re-
01 Jan 2015
TL;DR: Experimental results showed that the proposed algorithm yielded to relative reduction in error rates of 24.4 and 37.3% over the baseline systems respectively for IVIE and TIMIT.
Abstract: In this paper, we propose an algorithm to improve the performance of speaker identification systems. A baseline speaker identification system uses a scoring of a test utterance against all speakers' models; this could be termed as an evaluation at the observation level. In the proposed approach, and prior to the standard evaluation phase, an algorithm based on a frame level evaluation is applied. The speaker identification study is conducted using IVIE corpus and a randomly selected 120 speakers from TIMIT. Mel-frequency cepstral coefficients (MFCC) and Gaussian mixture model (GMM) are the main components in state of the art speaker identification systems and will be adopted in this work. Experimental results based on several systems with different training and testing conditions, showed that our proposed algorithm yielded to relative reduction in error rates of 24.4 and 37.3% over the baseline systems respectively for IVIE and TIMIT. The final performances reached measured by identification error rates are 3.4% and 5.2% for IVIE and TIMIT corp uses.
Proceedings ArticleDOI
01 Nov 2007
TL;DR: An adaptive system for voiced/unvoiced (V/UV) speech detection in the presence of background noise was implemented and the results were compared with a non-adaptive classification system and the V/UV detectors adopted by three important speech coding standards.
Abstract: The paper presents an adaptive system for voiced/unvoiced (V/UV) speech detection in the presence of background noise Genetic algorithms were used to select the features that offer the best V/UV detection according to the output of a background noise classifier (NC) and a signal to noise ratio estimation (SNRE) system The system was implemented and the tests performed using the TIMIT speech corpus and its phonetic classification The results were compared with a non-adaptive classification system and the V/UV detectors adopted by three important speech coding standards: LPC10, ITU-T G7231 and ETSI AMR In all cases the adaptive V/UV classifier outperformed the traditional solutions

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895