Home
/
Authors
/
Kai-Fu Lee

Author

Kai-Fu Lee

Other affiliations: Carnegie Mellon University, Apple Inc.

Bio: Kai-Fu Lee is an academic researcher from Microsoft. The author has contributed to research in topics: Hidden Markov model & Word error rate. The author has an hindex of 41, co-authored 87 publications receiving 8345 citations. Previous affiliations of Kai-Fu Lee include Carnegie Mellon University & Apple Inc..

Papers published on a yearly basis

2018
2011
2006
2004
2002
2000
1996
1995
1994
1993
1992
1991
1990
1989
1988
1986

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Speaker-independent phone recognition using hidden Markov models

[...]

Kai-Fu Lee¹, H.-W. Hon¹•Institutions (1)

Carnegie Mellon University¹

01 Nov 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.

...read moreread less

Abstract: Hidden Markov modeling is extended to speaker-independent phone recognition. Using multiple codebooks of various linear-predictive-coding (LPC) parameters and discrete hidden Markov models (HMMs) the authors obtain a speaker-independent phone recognition accuracy of 58.8-73.8% on the TIMIT database, depending on the type of acoustic and language models used. In comparison, the performance of expert spectrogram readers is only 69% without use of higher level knowledge. The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data. Since the results were evaluated on a standard database, they can be used as benchmarks to evaluate future systems. >

...read moreread less

895 citations

Patent•

Search engine with natural language-based robust parsing for user query and relevance feedback learning

[...]

Haifeng Wang, Kai-Fu Lee, Qiang Yang

24 Aug 2000

TL;DR: In this paper, a search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches, which includes a natural language parser that parses user queries and extracts syntactic and semantic information.

...read moreread less

Abstract: A search engine architecture is designed to handle a full range of user queries, from complex sentence-based queries to simple keyword searches. The search engine architecture includes a natural language parser that parses a user query and extracts syntactic and semantic information. The parser is robust in the sense that it not only returns fully-parsed results (e.g., a parse tree), but is also capable of returning partially-parsed fragments in those cases where more accurate or descriptive information in the user query is unavailable. A question matcher is employed to match the fully-parsed output and the partially-parsed fragments to a set of frequently asked questions (FAQs) stored in a database. The question matcher then correlates the questions with a group of possible answers arranged in standard templates that represent possible solutions to the user query. The search engine architecture also has a keyword searcher to locate other possible answers by searching on any keywords returned from the parser. The answers returned from the question matcher and the keyword searcher are presented to the user for confirmation as to which answer best represents the user's intentions when entering the initial search query. The search engine architecture logs the queries, the answers returned to the user, and the user's confirmation feedback in a log database. The search engine has a log analyzer to evaluate the log database to glean information that improves performance of the search engine over time by training the parser and the question matcher.

...read moreread less

616 citations

Book•

Readings in speech recognition

[...]

Alex Waibel, Kai-Fu Lee

01 May 1990

TL;DR: This chapter discusses four main approaches to speech recognition: template-based, knowledge-Based, Stochastic, connectionist, and connectionist.

...read moreread less

Abstract: Chapter 1 Why Study Speech Recognition? Chapter 2 Problems and Opportunities Chapter 3 Speech Analysis Chapter 4 Template-Based Approaches Chapter 5 Knowledge-Based Approaches Chapter 6 Stochastic Approaches Chapter 7 Connectionist Approaches Chapter 8 Language Processing for Speech Recognition Chapter 9 Systems

...read moreread less

566 citations

Book•DOI•

Automatic Speech Recognition

[...]

Kai-Fu Lee

01 Jan 1989

528 citations

Journal Article•DOI•

An overview of the SPHINX speech recognition system

[...]

Kai-Fu Lee¹, H.-W. Hon¹, Raj Reddy¹•Institutions (1)

Carnegie Mellon University¹

01 May 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: SPHINX is a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition, based on discrete hidden Markov models with LPC- (linear-predictive-coding) derived parameters.

...read moreread less

Abstract: A description is given of SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMMs) with LPC- (linear-predictive-coding) derived parameters. To provide speaker independence, knowledge was added to these HMMs in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word-duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, two new subword speech units are introduced: function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96%, respectively, on a 997-word task. >

...read moreread less

487 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio⁴, Yoshua Bengio⁵, Yoshua Bengio³, Patrick Haffner³ - Show less +3 more•Institutions (5)

Bell Labs¹, École Normale Supérieure², AT&T³, Alcatel-Lucent⁴, École Polytechnique de Montréal⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Journal Article•DOI•

A tutorial on hidden Markov models and selected applications in speech recognition

[...]

Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

...read moreread less

21,819 citations

Book•

Foundations of Statistical Natural Language Processing

[...]

Christopher D. Manning¹, Hinrich Schütze²•Institutions (2)

Stanford University¹, PARC²

28 May 1999

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.

...read moreread less

Abstract: Statistical approaches to processing natural language text have become dominant in recent years This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear The book contains all the theory and algorithms needed for building NLP tools It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications

...read moreread less

9,295 citations

Proceedings Article•DOI•

Speech recognition with deep recurrent neural networks

[...]

Alex Graves¹, Abdelrahman Mohamed¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

26 May 2013

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

...read moreread less

7,316 citations

Posted Content•

Speech Recognition with Deep Recurrent Neural Networks

[...]

Alex Graves¹, Abdelrahman Mohamed¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

22 Mar 2013-arXiv: Neural and Evolutionary Computing

TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates \emph{deep recurrent neural networks}, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.

...read moreread less

5,310 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse