Home
/
Authors
/
Keith Hall

Author

Keith Hall

Other affiliations: Johns Hopkins University, Brown University

Bio: Keith Hall is an academic researcher from Google. The author has contributed to research in topics: Language model & Parsing. The author has an hindex of 24, co-authored 65 publications receiving 3620 citations. Previous affiliations of Keith Hall include Johns Hopkins University & Brown University.

Papers published on a yearly basis

2022
2021
2020
2019
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2000
1910

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

[...]

Eneko Agirre¹, Enrique Alfonseca², Keith Hall², Jana Kravalova³, Marius Pasca², Aitor Soroa¹ - Show less +2 more•Institutions (3)

University of the Basque Country¹, Google², Charles University in Prague³

31 May 2009

TL;DR: This paper presents and compares WordNet-based and distributional similarity approaches, and pioneer cross-lingual similarity, showing that the methods are easily adapted for a cross-lingsual task with minor losses.

...read moreread less

Abstract: This paper presents and compares WordNet-based and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses.

...read moreread less

936 citations

Proceedings Article•

Universal Dependency Annotation for Multilingual Parsing

[...]

Ryan McDonald¹, Joakim Nivre², Yvonne Quirmbach-Brundage, Yoav Goldberg³, Dipanjan Das¹, Kuzman Ganchev¹, Keith Hall¹, Slav Petrov¹, Hao Zhang¹, Oscar Täckström¹, Claudia Bedini, Núria Bertomeu Castelló, Jungmee Lee - Show less +9 more•Institutions (3)

Google¹, Uppsala University², Bar-Ilan University³

01 Aug 2013

TL;DR: A new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean is presented, made freely available in order to facilitate research on multilingual dependency parsing.

...read moreread less

Abstract: We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing. 1

...read moreread less

489 citations

Proceedings Article•

Correlated-Q learning

[...]

Amy Greenwald¹, Keith Hall¹•Institutions (1)

Brown University¹

21 Aug 2003

TL;DR: Correlated-Q (CE-Q) learning is introduced, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept that generalizes both Nash-Q and Friend-and-Foe-Q.

...read moreread less

Abstract: This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash-Q and Friend-and-Foe-Q: in general-sum games, the set of correlated equilibria contains the set of Nash equilibria; in constant-sum games, the set of correlated equilibria contains the set of minimax equilibria. This paper describes experiments with four variants of CE-Q, demonstrating empirical convergence to equilibrium policies on a testbed of general-sum Markov games.

...read moreread less

436 citations

Proceedings Article•

Multi-Source Transfer of Delexicalized Dependency Parsers

[...]

Ryan McDonald¹, Slav Petrov¹, Keith Hall¹•Institutions (1)

Google¹

27 Jul 2011

TL;DR: This work demonstrates that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers and shows that simple methods for introducing multiple source languages can significantly improve the overall quality of the resulting parsers.

...read moreread less

Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source languages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-the-art performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.

...read moreread less

359 citations

Proceedings Article•

Distributed Training Strategies for the Structured Perceptron

[...]

Ryan McDonald¹, Keith Hall¹, Gideon S. Mann¹•Institutions (1)

Google¹

02 Jun 2010

TL;DR: This paper investigates distributed training strategies for the structured perceptron as a means to reduce training times when computing clusters are available and looks at two strategies and provides convergence bounds for a particular mode of distributed structured perceptrons training based on iterative parameter mixing (or averaging).

...read moreread less

Abstract: Perceptron training is widely applied in the natural language processing community for learning complex structured models. Like all structured prediction learning frameworks, the structured perceptron can be costly to train as training complexity is proportional to inference, which is frequently non-linear in example sequence length. In this paper we investigate distributed training strategies for the structured perceptron as a means to reduce training times when computing clusters are available. We look at two strategies and provide convergence bounds for a particular mode of distributed structured perceptron training based on iterative parameter mixing (or averaging). We present experiments on two structured prediction problems -- named-entity recognition and dependency parsing -- to highlight the efficiency of this method.

...read moreread less

307 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

Cited by

PDF

Open Access

More filters

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Posted Content•

Communication-Efficient Learning of Deep Networks from Decentralized Data

[...]

H. Brendan McMahan¹, Eider Moore¹, Daniel Ramage¹, Seth Hampson, Blaise Aguera y Arcas¹ - Show less +1 more•Institutions (1)

Google¹

17 Feb 2016-arXiv: Learning

TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

...read moreread less

Abstract: Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.

...read moreread less

5,936 citations

Proceedings Article•

Large Scale Distributed Deep Networks

[...]

Jeffrey Dean¹, Greg S. Corrado¹, Rajat Monga¹, Kai Chen¹, Matthieu Devin¹, Mark Z. Mao¹, Marc'Aurelio Ranzato¹, Andrew W. Senior¹, Paul A. Tucker¹, Ke Yang¹, Quoc V. Le¹, Andrew Y. Ng¹ - Show less +8 more•Institutions (1)

Google¹

03 Dec 2012

TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.

...read moreread less

Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

...read moreread less

3,475 citations

Reference Entry•DOI•

IEEE Transactions on Pattern Analysis and Machine Intelligence

[...]

King-Sun Fu

15 Oct 2004

2,118 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse