Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Framework for cross-language automatic phonetic segmentation

[...]

Kalu U. Ogbureke¹, Julie Carson-Berndsen¹•Institutions (1)

University College Dublin¹

14 Mar 2010

TL;DR: This paper presents cross-language automatic phonetic segmentation using Hidden Markov Models (HMMs) so as to provide extensive models that will be applicable across languages.

...read moreread less

Abstract: Annotation of large multilingual corpora remains a challenge to the data-driven approach to speech research, especially for under-resourced languages. This paper presents cross-language automatic phonetic segmentation using Hidden Markov Models (HMMs). The underlying notion is segmentation based on articulation (manner and place) so as to provide extensive models that will be applicable across languages. A test on the Appen Spanish speech corpus gives phone recognition accuracy of 61.15% when bootstrapped with acoustic models trained on the TIMIT as compared with a baseline result of 54.63% for flat start initialization of the monophone models.

...read moreread less

8 citations

Journal Article•DOI•

Eigentriphones for Context-Dependent Acoustic Modeling

[...]

Tom Ko¹, Brian Mak¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jun 2013-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The experiments on TIMIT phone recognition and the Wall Street Journal 5K-vocabulary continuous speech recognition show that eigentriphones estimated from state clusters defined by the nodes in the same phonetic regression class tree used in state tying result in further performance gain.

...read moreread less

Abstract: Most automatic speech recognizers employ tied-state triphone hidden Markov models (HMM), in which the corresponding triphone states of the same base phone are tied. State tying is commonly performed with the use of a phonetic regression class tree which renders robust context-dependent modeling possible by carefully balancing the amount of training data with the degree of tying. However, tying inevitably introduces quantization error: triphones tied to the same state are not distinguishable in that state. Recently we proposed a new triphone modeling approach called eigentriphone modeling in which all triphone models are, in general, distinct. The idea is to create an eigenbasis for each base phone (or phone state) and all its triphones (or triphone states) are represented as distinct points in the space spanned by the basis. We have shown that triphone HMMs trained using model-based or state-based eigentriphones perform at least as well as conventional tied-state HMMs. In this paper, we further generalize the definition of eigentriphones over clusters of acoustic units. Our experiments on TIMIT phone recognition and the Wall Street Journal 5K-vocabulary continuous speech recognition show that eigentriphones estimated from state clusters defined by the nodes in the same phonetic regression class tree used in state tying result in further performance gain.

...read moreread less

8 citations

Proceedings Article•DOI•

Speaker change detection using features through a neural network speaker classifier

[...]

Zhenhao Ge, Ananth N. Iyer, Srinath Cheluvaraja, Aravind Ganapathiraju

01 Sep 2017

TL;DR: In this article, a text-independent speaker classifier is trained using in-domain speaker data and features of conversational speech from out-of-domain speakers are then converted into likelihood vectors, i.e., similarity scores comparing to the indomain speakers.

...read moreread less

Abstract: The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using in-domain speaker data. Through the network, features of conversational speech from out-of-domain speakers are then converted into likelihood vectors, i.e., similarity scores comparing to the in-domain speakers. These transformed features demonstrate very distinctive patterns, which facilitates differentiating speakers and enable speaker change detection with some straight-forward distance metrics. The speaker classifier and the speaker change detector are trained/tested using speech of the first 200 (in-domain) and the remaining 126 (out-of-domain) male speakers in TIMIT, respectively. For the speaker classification, 100% accuracy at a 200 speaker size is achieved on any testing file, given the speech duration is at least 0.97 seconds. For the speaker change detection using speaker classification outputs, performance based on 0.5,1, and 2 seconds of inspection intervals were evaluated in terms of error rate and F1 score, using synthesized data by concatenating speech from various speakers. It captures close to 97% of the changes by comparing the current second of speech with the previous second, which is very competitive among literature using other methods.

...read moreread less

8 citations

Journal Article•DOI•

An Improved Unsupervised Single-Channel Speech Separation Algorithm for Processing Speech Sensor Signals

[...]

Dazhi Jiang¹, Zhihui He¹, Yingqing Lin¹, Yifei Chen¹, Linyan Xu² - Show less +1 more•Institutions (2)

Shantou University¹, Polytechnic University of Milan²

27 Feb 2021-Wireless Communications and Mobile Computing

TL;DR: In this paper, an unsupervised speech separation method was proposed, which combined with Convolutional Non-Negative Matrix Factorization and Joint Approximative Diagonalization of Eigenmatrix.

...read moreread less

Abstract: As network supporting devices and sensors in the Internet of Things are leaping forward, countless real-world data will be generated for human intelligent applications. Speech sensor networks, an important part of the Internet of Things, have numerous application needs. Indeed, the sensor data can further help intelligent applications to provide higher quality services, whereas this data may involve considerable noise data. Accordingly, speech signal processing method should be urgently implemented to acquire low-noise and effective speech data. Blind source separation and enhancement technique refer to one of the representative methods. However, in the unsupervised complex environment, in the only presence of a single-channel signal, many technical challenges are imposed on achieving single-channel and multiperson mixed speech separation. For this reason, this study develops an unsupervised speech separation method CNMF+JADE, i.e., a hybrid method combined with Convolutional Non-Negative Matrix Factorization and Joint Approximative Diagonalization of Eigenmatrix. Moreover, an adaptive wavelet transform-based speech enhancement technique is proposed, capable of adaptively and effectively enhancing the separated speech signal. The proposed method is aimed at yielding a general and efficient speech processing algorithm for the data acquired by speech sensors. As revealed from the experimental results, in the TIMIT speech sources, the proposed method can effectively extract the target speaker from the mixed speech with a tiny training sample. The algorithm is highly general and robust, capable of technically supporting the processing of speech signal acquired by most speech sensors.

...read moreread less

8 citations

Journal Article•DOI•

An artificially intelligent approach for automatic speech processing based on triune ontology and adaptive tribonacci deep neural networks

[...]

Gerard Deepak, Deepak Surya, Ishdutt Trivedi, Ayush Kumar, Amrutha Lingampalli, Santhana vijayan - Show less +2 more

01 Mar 2022-Computers & Electrical Engineering

TL;DR: In this article , a hybrid approach for automatic speech recognition, the TriNNOnto has been proposed in which, integrates different approaches like Language Model integrated with dynamic Triune Ontology generation scheme, Acoustic Model and Feature modelling are hybridised based on the Tribonacci based Deep Neural Network, which decides upon the number of layers depending on the size of the samples and their count.

...read moreread less

8 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics