Home
/
Topics
/
TIMIT

Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning Filterbanks from Raw Speech for Phone Recognition

[...]

Neil Zeghidour¹, Nicolas Usunier¹, Iasonas Kokkinos¹, Thomas Schaiz², Gabriel Synnaeve¹, Emmanuel Dupoux² - Show less +2 more•Institutions (2)

Facebook¹, PSL Research University²

15 Apr 2018

TL;DR: In this paper, a bank of complex filters that operate on the raw waveform and are fed into a convolutional neural network for end-to-end phone recognition is trained.

...read moreread less

Abstract: We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD- filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.

...read moreread less

91 citations

Proceedings Article•DOI•

Towards multi-speaker unsupervised speech pattern discovery

[...]

Yaodong Zhang¹, James Glass¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Mar 2010

TL;DR: The viability of using the posteriorgram approach to handle many talkers by finding clusters of words in the TIMIT corpus is demonstrated.

...read moreread less

Abstract: In this paper, we explore the use of a Gaussian posteriorgram based representation for unsupervised discovery of speech patterns. Compared with our previous work, the new approach provides significant improvement towards speaker independence. The framework consists of three main procedures: a Gaussian posteriorgram generation procedure which learns an unsupervised Gaussian mixture model and labels each speech frame with a Gaussian posteriorgram representation; a segmental dynamic time warping procedure which locates pairs of similar sequences of Gaussian posteriorgram vectors; and a graph clustering procedure which groups similar sequences into clusters. We demonstrate the viability of using the posteriorgram approach to handle many talkers by finding clusters of words in the TIMIT corpus.

...read moreread less

90 citations

Proceedings Article•DOI•

Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains

[...]

Raman Arora¹, Karen Livescu¹•Institutions (1)

Toyota¹

26 May 2013

TL;DR: The behavior of CCA-based acoustic features on the task of phonetic recognition is studied, and to what extent they are speaker-independent or domain-independent.

...read moreread less

Abstract: Canonical correlation analysis (CCA) and kernel CCA can be used for unsupervised learning of acoustic features when a second view (e.g., articulatory measurements) is available for some training data, and such projections have been used to improve phonetic frame classification. Here we study the behavior of CCA-based acoustic features on the task of phonetic recognition, and investigate to what extent they are speaker-independent or domain-independent. The acoustic features are learned using data drawn from the University of Wisconsin X-ray Microbeam Database (XRMB). The features are evaluated within and across speakers on XRMB data, as well as on out-of-domain TIMIT and MOCHA-TIMIT data. Experimental results show consistent improvement with the learned acoustic features over baseline MFCCs and PCA projections. In both speaker-dependent and cross-speaker experiments, phonetic error rates are improved by 4-9% absolute (10-23% relative) using CCA-based features over baseline MFCCs. In cross-domain phonetic recognition (training on XRMB and testing on MOCHA or TIMIT), the learned projections provide smaller improvements.

...read moreread less

89 citations

Journal Article•DOI•

Phoneme recognition using ICA-based feature extraction and transformation

[...]

Oh-Wook Kwon¹, Te-Won Lee²•Institutions (2)

Chungbuk National University¹, University of California, San Diego²

01 Jun 2004-Signal Processing

TL;DR: A new scheme is proposed that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform, and since the basis functions are not shift invariant, is extended to include a frequency-based ICA stage that removes redundant time shift information.

...read moreread less

89 citations

Proceedings Article•DOI•

Voice source cepstrum coefficients for speaker identification

[...]

Jon Gudnason¹, Mike Brookes¹•Institutions (1)

Imperial College London¹

12 May 2008

TL;DR: A novel feature set for speaker recognition that is based on the voice source signal that is robust to LPC analysis errors and low-frequency phase distortion and compares favourably to other proposed voice source feature sets.

...read moreread less

Abstract: We propose a novel feature set for speaker recognition that is based on the voice source signal. The feature extraction process uses closed-phase LPC analysis to estimate the vocal tract transfer function. The LPC spectrum envelope is converted to cepstrum coefficients which are used to derive the voice source features. Unlike approaches based on inverse-filtering, our procedure is robust to LPC analysis errors and low-frequency phase distortion. We have performed text-independent closed-set speaker identification experiments on the TIMIT and the YOHO databases using a standard Gaussian mixture model technique. Compared to using mel- frequency cepstrum coefficients, the misclassification rate for the TIMIT database reduced from 1.51% to 0.16% when combined with the proposed voice source features. For the YOHO database the mis- classification rate decreased from 13.79% to 10.07%. The new feature vector also compares favourably to other proposed voice source feature sets.

...read moreread less

89 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
…
15
16
17
18
19
20
21
…
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics