Home
/
Authors
/
F.K. Soong

Author

F.K. Soong

Other affiliations: Bell Labs

Bio: F.K. Soong is an academic researcher from Alcatel-Lucent. The author has contributed to research in topics: Hidden Markov model & Speaker recognition. The author has an hindex of 22, co-authored 38 publications receiving 3216 citations. Previous affiliations of F.K. Soong include Bell Labs.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Line spectrum pair (LSP) and speech data compression

[...]

F.K. Soong¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

19 Mar 1984

TL;DR: An expression for spectral sensitivity with respect to single LSP frequency deviation is derived such that some insight on their quantization effects can be obtained and results on multi-pulse LPC using LSP for spectral information compression are presented.

...read moreread less

Abstract: Line Spectrum Pair (LSP) was first introduced by Itakura [1,2] as an alternative LPC spectral representations. It was found that this new representation has such interesting properties as (1) all zeros of LSP polynomials are on the unit circle, (2) the corresponding zeros of the symmetric and anti-symmetric LSP polynomials are interlaced, and (3) the reconstructed LPC all-pole filter preserves its minimum phase property if (1) and (2) are kept intact through a quantization procedure. In this paper we prove all these properties via a "phase function." The statistical characteristics of LSP frequencies are investigated by analyzing a speech data base. In addition, we derive an expression for spectral sensitivity with respect to single LSP frequency deviation such that some insight on their quantization effects can be obtained. Results on multi-pulse LPC using LSP for spectral information compression are finally presented.

...read moreread less

506 citations

Proceedings Article•DOI•

A vector quantization approach to speaker recognition

[...]

F.K. Soong¹, Aaron E. Rosenberg², Lawrence R. Rabiner², Biing-Hwang Juang²•Institutions (2)

Bell Labs¹, AT&T²

26 Apr 1985

TL;DR: A vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker and was used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule.

...read moreread less

Abstract: In this study a vector quantization (VQ) codebook was used as an efficient means of characterizing the short-time spectral features of a speaker. A set of such codebooks were then used to recognize the identity of an unknown speaker from his/her unlabelled spoken utterances based on a minimum distance (distortion) classification rule. A series of speaker recognition experiments was performed using a 100-talker (50 male and 50 female) telephone recording database consisting of isolated digit utterances. For ten random but different isolated digits, over 98% speaker identification accuracy was achieved. The effects, on performance, of different system parameters such as codebook sizes, the number of test digits, phonetic richness of the text, and difference in recording sessions were also studied in detail.

...read moreread less

493 citations

Journal Article•DOI•

On the use of instantaneous and transitional spectral information in speaker recognition

[...]

F.K. Soong¹, Aaron E. Rosenberg¹•Institutions (1)

Bell Labs¹

01 Jun 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The experimental results show that the instantaneous and transitional representations are relatively uncorrelated, thus providing complementary information for speaker recognition, and simple transmission channel variations are shown to affect both the instantaneous spectral representations and the corresponding recognition performance significantly.

...read moreread less

Abstract: The use of instantaneous and transitional spectral representations of spoken utterances for speaker recognition is investigated. Linear-predictive-coding (LPC)-derived cepstral coefficients are used to represent instantaneous spectral information, and best linear fits of each cepstral coefficient over a specified time window are used to represent transitional information. An evaluation has been carried out using a database of isolated digit utterances over dialed-up telephone lines by 10 talkers. Two vector quantization (VQ) codebooks, instantaneous and transitional, were constructed from each speaker's training utterances. The experimental results show that the instantaneous and transitional representations are relatively uncorrelated, thus providing complementary information for speaker recognition. A rectangular window of approximately 100 ms duration provides an effective estimate of the transitional spectral features for speaker recognition. Also, simple transmission channel variations are shown to affect both the instantaneous spectral representations and the corresponding recognition performance significantly, while the transitional representations and performance are relatively resistant. >

...read moreread less

278 citations

Proceedings Article•DOI•

A tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition

[...]

F.K. Soong¹, Eng-Fong Huang¹•Institutions (1)

Bell Labs¹

14 Apr 1991

TL;DR: A novel tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition is presented, which is different from the traditional time synchronous Viterbi search in its ability to find not just the best but the N best paths of different word content.

...read moreread less

Abstract: A novel tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition is presented. The search consists of a forward time-synchronous trellis search and a backward time-asynchronous tree search. The Viterbi algorithm is used for recording the scores of all partial paths in a trellis time synchronously. Then a backward A* algorithm based tree search is used to extend partial paths time asynchronously. Extended partial paths in the backward tree search are rank ordered in a stack by their corresponding best possible scores of the remaining paths which are prerecorded in the forward trellis path map. In each path growing cycle, the current best partial path, which is at the top of the stack, is extended by the best possible one arc (word) extension. The tree-trellis search is different from the traditional time synchronous Viterbi search in its ability to find not just the best but the N best paths of different word content. >

...read moreread less

242 citations

Proceedings Article•DOI•

On the use of instantaneous and transitional spectral information in speaker recognition

[...]

F.K. Soong¹, Aaron E. Rosenberg¹•Institutions (1)

Bell Labs¹

01 Apr 1986

TL;DR: The experimental results show that the instantaneous and transitional representations are relatively uncorrelated thus providing complementary information for speaker recognition, and simple transmission channel variations are shown to affect the instantaneous spectral representations and the corresponding recognition performance significantly, while the transitional representations and performance are relatively resistant.

...read moreread less

Abstract: The use of instantaneous and transitional spectral representations of spoken utterances for speaker recognition is investigated. LPC derived-cepstral coefficients are used to represent instantaneous spectral information and best linear fits of each cepstral coefficient over a specified time window are used to represent transitional information. An evaluation has been carried out using a data base of isolated digit utterances over dialed-up telephone lines by 10 talkers. Two vector quantization (VQ) codebooks, instantaneous and transitional, are constructed from training utterances for each speaker. The experimental results show that the instantaneous and transitional representations are relatively uncorrelated thus providing complementary information for speaker recognition. A rectangular window of approximately 100-150 ms duration provides an effective estimate of spectral transitions for speaker recognition. Also, simple transmission channel variations are shown to affect the instantaneous spectral representations and the corresponding recognition performance significantly, while the transitional representations and performance are relatively resistant.

...read moreread less

228 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A tutorial on hidden Markov models and selected applications in speech recognition

[...]

Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

...read moreread less

21,819 citations

Journal Article•DOI•

Speaker Verification Using Adapted Gaussian Mixture Models

[...]

Douglas A. Reynolds¹, Thomas F. Quatieri¹, Robert B. Dunn¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-Digital Signal Processing

TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

...read moreread less

4,673 citations

Journal Article•DOI•

Robust text-independent speaker identification using Gaussian mixture speaker models

[...]

Douglas A. Reynolds¹, Richard Rose²•Institutions (2)

Massachusetts Institute of Technology¹, AT&T²

01 Jan 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.

...read moreread less

Abstract: This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initialization, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task. >

...read moreread less

3,134 citations

Journal Article•DOI•

Speaker recognition: a tutorial

[...]

Jr. J.P. Campbell¹•Institutions (1)

Johns Hopkins University¹

01 Sep 1997

TL;DR: A tutorial on the design and development of automatic speaker-recognition systems is presented and a new automatic speakers recognition system is given that performs with 98.9% correct decalcification.

...read moreread less

Abstract: A tutorial on the design and development of automatic speaker-recognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person's claimed identity. Speech processing and the basic components of automatic speaker-recognition systems are shown and design tradeoffs are discussed. Then, a new automatic speaker-recognition system is given. This recognizer performs with 98.9% correct decalcification. Last, the performances of various systems are compared.

...read moreread less

1,686 citations

Journal Article•DOI•

Hidden Markov models for speech recognition

[...]

Biing-Hwang Juang¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Aug 1991-Technometrics

TL;DR: The role of statistical methods in this powerful technology as applied to speech recognition is addressed and a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations are discussed.

...read moreread less

Abstract: The use of hidden Markov models for speech recognition has become predominant in the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons this method has become so popular are the inherent statistical (mathematically precise) framework; the ease and availability of training algorithms for cstimating the parameters of the models from finite training sets of speech data; the flexibility of the resulting recognition system in which one can easily change the size, type, or architecture of the models to suit particular words, sounds, and so forth; and the ease of implementation of the overall recognition system. In this expository article, we address the role of statistical methods in this powerful technology as applied to speech recognition and discuss a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations.

...read moreread less

1,480 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse