Home
/
Authors
/
J.J. Godfrey

Author

J.J. Godfrey

Bio: J.J. Godfrey is an academic researcher from Texas Instruments. The author has contributed to research in topics: Speaker recognition & Word error rate. The author has an hindex of 5, co-authored 6 publications receiving 1950 citations.

Topics: Speaker recognition, Word error rate, Speech corpus, Syllable, Acoustic model ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

SWITCHBOARD: telephone speech corpus for research and development

[...]

J.J. Godfrey¹, E. Holliman¹, J. McDaniel¹•Institutions (1)

Texas Instruments¹

23 Mar 1992

TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.

...read moreread less

Abstract: SWITCHBOARD is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition. About 2500 conversations by 500 speakers from around the US were collected automatically over T1 lines at Texas Instruments. Designed for training and testing of a variety of speech processing algorithms, especially in speaker verification, it has over an 1 h of speech from each of 50 speakers, and several minutes each from hundreds of others. A time-aligned word for word transcription accompanies each recording. >

...read moreread less

2,102 citations

Proceedings Article•DOI•

Robust automatic time alignment of orthographic transcriptions with unconstrained speech

[...]

Barbara J. Wheatley¹, George R. Doddington¹, Charles T. Hemphill¹, J.J. Godfrey¹, E. Holliman¹, J. McDaniel¹, D. Fisher¹ - Show less +3 more•Institutions (1)

Texas Instruments¹

23 Mar 1992

TL;DR: This method successfully aligns transcriptions with speech in unconstrained 5 to 10 min conversations collected over long-distance telephone lines and requires minimal manual processing and generally produces correct alignments despite the challenging nature of the data.

...read moreread less

Abstract: A method for automatic time alignment of orthographically transcribed speech using supervised speaker-independent automatic speech recognition based on the orthographic transcription, an online dictionary, and HMM phone models is presented. This method successfully aligns transcriptions with speech in unconstrained 5 to 10 min conversations collected over long-distance telephone lines. It requires minimal manual processing and generally produces correct alignments despite the challenging nature of the data. The robustness and efficiency of the method make it a practical tool for very large speech corpora. >

...read moreread less

28 citations

Proceedings Article•DOI•

Advances in alphadigit recognition using syllables

[...]

Jonathan Hamaker¹, Aravind Ganapathiraju, Joseph Picone, J.J. Godfrey•Institutions (1)

Mississippi State University¹

12 May 1998

TL;DR: A set of experiments which explore the use of syllables for recognition of continuous alphadigit utterances on the Switchboard corpus find the performance of the base syllable system better than a crossword triphone system while requiring a small portion of the resources necessary for triphone systems.

...read moreread less

Abstract: We present a set of experiments which explore the use of syllables for recognition of continuous alphadigit utterances. In this system, syllables are used as the primary unit of recognition. This work was motivated by our need to verify and isolate phenomena seen when performing syllable-based experiments on the Switchboard corpus. The performance of our base syllable system is better than a crossword triphone system while requiring a small portion of the resources necessary for triphone systems. All experiments were performed on the OGI Alphadigits corpus, which consists of telephone-bandwidth alphadigit strings. The word error rate (WER) of the best syllable system (context-independent syllables) reported here is 11.1% compared to 12.2% for a crossword triphone system.

...read moreread less

17 citations

Proceedings Article•DOI•

Speaker-dependent name dialing in a car environment with out-of-vocabulary rejection

[...]

C.S. Ramalingam¹, Yifan Gong, L.P. Netsch, W.W. Anderson, J.J. Godfrey, Yu-Hung Kao - Show less +2 more•Institutions (1)

Texas Instruments¹

15 Mar 1999

TL;DR: A system for name dialing in the car and results under three driving conditions using real-life data are described and a simple algorithm to reject out-of-vocabulary names is outlined.

...read moreread less

Abstract: We describe a system for name dialing in the car and present results under three driving conditions using real-life data. The names are enrolled in the parked car condition (engine off) and we describe two approaches for endpointing them-energy-based and recognition-based schemes-which result in word-based and phone-based models, respectively. We outline a simple algorithm to reject out-of-vocabulary names. PMC is used for noise compensation. When tested on an internally collected twenty-speaker database, for a list size of 50 and a hand-held microphone, the performance averaged over all driving conditions and speakers was 98%/92% (IV accuracy/OOV rejection); for the hands-free data, it was 98%180%.

...read moreread less

11 citations

Proceedings Article•DOI•

Transforming HMMs for speaker-independent hands-free speech recognition in the car

[...]

Yifan Gong¹, J.J. Godfrey¹•Institutions (1)

Texas Instruments¹

15 Mar 1999

TL;DR: A linear regression-based model adaptation procedure is proposed to reduce the mismatch between the two acoustic conditions of the HMM by transforming the HMMs trained in a quiet condition to maximize the likelihood of observing the adaptation utterances.

...read moreread less

Abstract: In the absence of HMMs trained with speech collected in the target environment, one may use HMMs trained with a large amount of speech collected in another recording condition (e.g., quiet office, with high quality microphone). However, this may result in poor performance because of the mismatch between the two acoustic conditions. We propose a linear regression-based model adaptation procedure to reduce such a mismatch. With some adaptation utterances collected for the target environment, the procedure transforms the HMMs trained in a quiet condition to maximize the likelihood of observing the adaptation utterances. The transformation must be designed to maintain speaker-independence of the HMM. Our speaker-independent test results show that with this procedure about 1% digit error rate can be achieved for hands-free recognition, using target environment speech from only 20 speakers.

...read moreread less

10 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

[...]

Daniel S. Park¹, William Chan¹, Yu Zhang², Chung-Cheng Chiu¹, Barret Zoph¹, Ekin D. Cubuk¹, Quoc V. Le¹ - Show less +3 more•Institutions (2)

Google¹, Massachusetts Institute of Technology²

18 Apr 2019

TL;DR: This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.

...read moreread less

Abstract: We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.

...read moreread less

2,758 citations

Proceedings Article•DOI•

SWITCHBOARD: telephone speech corpus for research and development

[...]

J.J. Godfrey¹, E. Holliman¹, J. McDaniel¹•Institutions (1)

Texas Instruments¹

23 Mar 1992

...read moreread less

2,102 citations

Journal Article•DOI•

An empirical study of smoothing techniques for language modeling

[...]

Stanley F. Chen¹, Joshua T. Goodman²•Institutions (2)

Carnegie Mellon University¹, Microsoft²

01 Oct 1999-Computer Speech & Language

TL;DR: This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.

...read moreread less

1,948 citations

Book•

Introduction to Semi-Supervised Learning

[...]

Xiaojin Zhu¹, Andrew Goldberg¹, Ronald Brachman, Thomas G. Dietterich•Institutions (1)

University of Wisconsin-Madison¹

29 Jun 2009

TL;DR: This introductory book presents some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi- supervised support vector machines, and discusses their basic mathematical formulation.

...read moreread less

Abstract: Semi-supervised learning is a learning paradigm concerned with the study of how computers and natural systems such as humans learn in the presence of both labeled and unlabeled data. Traditionally, learning has been studied either in the unsupervised paradigm (e.g., clustering, outlier detection) where all the data is unlabeled, or in the supervised paradigm (e.g., classification, regression) where all the data is labeled.The goal of semi-supervised learning is to understand how combining labeled and unlabeled data may change the learning behavior, and design algorithms that take advantage of such a combination. Semi-supervised learning is of great interest in machine learning and data mining because it can use readily available unlabeled data to improve supervised learning tasks when the labeled data is scarce or expensive. Semi-supervised learning also shows potential as a quantitative tool to understand human category learning, where most of the input is self-evidently unlabeled. In this introductory book, we present some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi-supervised support vector machines. For each model, we discuss its basic mathematical formulation. The success of semi-supervised learning depends critically on some underlying assumptions. We emphasize the assumptions made by each model and give counterexamples when appropriate to demonstrate the limitations of the different models. In addition, we discuss semi-supervised learning for cognitive psychology. Finally, we give a computational learning theoretic perspective on semi-supervised learning, and we conclude the book with a brief discussion of open questions in the field.

...read moreread less

1,913 citations

Journal Article•DOI•

Digital processing of speech signals

[...]

M.G. Bellanger

01 Oct 1980

1,565 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse