Home
/
Topics
/
TIMIT

Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A genetic classification method for speaker recognition

[...]

Qingyang Hong¹, Sam Kwong¹•Institutions (1)

City University of Hong Kong¹

01 Feb 2005-Engineering Applications of Artificial Intelligence

TL;DR: A hybrid training method based on genetic algorithm (GA) that utilizes the global searching capability of GA and combines the effectiveness of the ML method is proposed.

...read moreread less

42 citations

Posted Content•

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

[...]

Liang Lu, Lingpeng Kong, Chris Dyer¹, Noah A. Smith², Steve Renals - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, University of Washington²

01 Mar 2016-arXiv: Computation and Language

TL;DR: In this article, a segmental recurrent neural network (RNN) was used for feature extraction in an end-to-end acoustic modeling model, which does not rely on an external system to provide features or segmentation boundaries.

...read moreread less

Abstract: We study the segmental recurrent neural network for end-to-end acoustic modelling. This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction. Compared to most previous CRF-based acoustic models, it does not rely on an external system to provide features or segmentation boundaries. Instead, this model marginalises out all the possible segmentations, and features are extracted from the RNN trained together with the segmental CRF. In essence, this model is self-contained and can be trained end-to-end. In this paper, we discuss practical training and decoding issues as well as the method to speed up the training in the context of speech recognition. We performed experiments on the TIMIT dataset. We achieved 17.3 phone error rate (PER) from the first-pass decoding --- the best reported result using CRFs, despite the fact that we only used a zeroth-order CRF and without using any language model.

...read moreread less

42 citations

Proceedings Article•DOI•

Deep neural support vector machines for speech recognition

[...]

Shi-Xiong Zhang¹, Chaojun Liu¹, Kaisheng Yao¹, Yifan Gong¹•Institutions (1)

Microsoft¹

19 Apr 2015

TL;DR: A new type of deep neural networks (DNNs) that uses a support vector machine (SVM) at the top layer for classification and has verified its effectiveness on the TIMIT task for continuous speech recognition.

...read moreread less

Abstract: A new type of deep neural networks (DNNs) is presented in this paper. Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer for classification. The new DNN instead uses a support vector machine (SVM) at the top layer. Two training algorithms are proposed at the frame and sequence-level to learn parameters of SVM and DNN in the maximum-margin criteria. In the frame-level training, the new model is shown to be related to the multiclass SVM with DNN features; In the sequence-level training, it is related to the structured SVM with DNN features and HMM state transition features. Its decoding process is similar to the DNN-HMM hybrid system but with frame-level posterior probabilities replaced by scores from the SVM. We term the new model deep neural support vector machine (DNSVM). We have verified its effectiveness on the TIMIT task for continuous speech recognition.

...read moreread less

41 citations

Journal Article•DOI•

Parametric subspace modeling of speech transitions

[...]

K. Reinhard¹, Mahesan Niranjan¹•Institutions (1)

University of Cambridge¹

01 Feb 1999-Speech Communication

TL;DR: An attempt at capturing segmental transition information for speech recognition tasks using the Principal Curves method and the Generative Topographic map technique as description of the temporal evolution in terms of latent variables was performed.

...read moreread less

41 citations

Posted Content•

Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation

[...]

Ehsan Hosseini-Asl¹, Yingbo Zhou¹, Caiming Xiong¹, Richard Socher¹•Institutions (1)

Salesforce.com¹

01 Jul 2018-arXiv: Learning

TL;DR: This paper proposes an augmented cyclic adversarial learning model that enforces the cycle-consistency constraint via an external task specific model, which encourages the preservation of task-relevant content as opposed to exact reconstruction.

...read moreread less

Abstract: Training a model to perform a task typically requires a large amount of data from the domains in which the task will be applied. However, it is often the case that data are abundant in some domains but scarce in others. Domain adaptation deals with the challenge of adapting a model trained from a data-rich source domain to perform well in a data-poor target domain. In general, this requires learning plausible mappings between domains. CycleGAN is a powerful framework that efficiently learns to map inputs from one domain to another using adversarial training and a cycle-consistency constraint. However, the conventional approach of enforcing cycle-consistency via reconstruction may be overly restrictive in cases where one or more domains have limited training data. In this paper, we propose an augmented cyclic adversarial learning model that enforces the cycle-consistency constraint via an external task specific model, which encourages the preservation of task-relevant content as opposed to exact reconstruction. We explore digit classification in a low-resource setting in supervised, semi and unsupervised situation, as well as high resource unsupervised. In low-resource supervised setting, the results show that our approach improves absolute performance by 14% and 4% when adapting SVHN to MNIST and vice versa, respectively, which outperforms unsupervised domain adaptation methods that require high-resource unlabeled target domain. Moreover, using only few unsupervised target data, our approach can still outperforms many high-resource unsupervised models. In speech domains, we similarly adopt a speech recognition model from each domain as the task specific model. Our approach improves absolute performance of speech recognition by 2% for female speakers in the TIMIT dataset, where the majority of training samples are from male voices.

...read moreread less

41 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
…
32
33
34
35
36
37
38
…
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics