Home
/
Authors
/
Kevin Duh

Author

Kevin Duh

Other affiliations: University of Washington, Nara Institute of Science and Technology, Nippon Telegraph and Telephone ...read more

Bio: Kevin Duh is an academic researcher from Johns Hopkins University. The author has contributed to research in topics: Machine translation & Language model. The author has an hindex of 38, co-authored 205 publications receiving 5369 citations. Previous affiliations of Kevin Duh include University of Washington & Nara Institute of Science and Technology.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004

Papers

PDF

Open Access

More filters

Posted Content•

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolox\'ochitl Mixtec

[...]

Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe - Show less +2 more

26 Jan 2021-arXiv: Audio and Speech Processing

TL;DR: In this article, an end-to-end ASR system was proposed to overcome the transcription bottleneck and transcriber shortage that hinders endangered language (EL) documentation, and a novice transcription correction task was proposed.

...read moreread less

Abstract: "Transcription bottlenecks", created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, eschews linguistic resources but is instead more dependent on large-data settings. We open source a Yoloxochitl Mixtec EL corpus. First, we review our method in building an end-to-end ASR system in a way that would be reproducible by the ASR community. We then propose a novice transcription correction task and demonstrate how ASR systems and novice transcribers can work together to improve EL documentation. We believe this combinatory methodology would mitigate the transcription bottleneck and transcriber shortage that hinders EL documentation.

...read moreread less

4 citations

Book Chapter•DOI•

An Improved Hierarchical Word Sequence Language Model Using Word Association

[...]

Xiaoyi Wu¹, Yuji Matsumoto¹, Kevin Duh¹, Hiroyuki Shindo¹•Institutions (1)

Nara Institute of Science and Technology¹

24 Nov 2015

TL;DR: The basic HWS approach is improved upon by generalizing it to exploit not only word frequencies but word association, and both intrinsic and extrinsic experiments verify that word association based HWS models can achieve better performance.

...read moreread less

Abstract: Language modeling is a fundamental research problem that has applications for many NLP tasks. For estimating probabilities, most research on language modeling uses n-gram approach to factor sentence probabilities. However, the assumption of n-gram is too simple to cope with the data sparseness problem, which affects the final performance of language models. At the point, Hierarchical Word Sequence abbreviated as HWS language model, which uses word frequency information to convert raw sentences into special n-gram sequences, can be viewed as an effective alternative to normal n-gram method. In this paper, we improve upon the basic HWS approach by generalizing it to exploit not only word frequencies but word association. For evaluation, we compare word association based HWS models to normal HWS models and normal n-gram models. Both intrinsic and extrinsic experiments verify that word association based HWS models can achieve better performance.

...read moreread less

4 citations

Proceedings Article•

A Comparative Study of Target Dependency Structures for Statistical Machine Translation

[...]

Xianchao Wu¹, Katsuhito Sudoh¹, Kevin Duh¹, Hajime Tsukada¹, Masaaki Nagata¹ - Show less +1 more•Institutions (1)

Nippon Telegraph and Telephone¹

08 Jul 2012

TL;DR: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers to measure the impact of these non-isomorphic dependency structures to be used for string-to-dependency translation.

...read moreread less

Abstract: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these non-isomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser's PASs achieved the best dependency and translation accuracies.

...read moreread less

4 citations

DOI•

ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper

[...]

Hirofumi Inaguma, Shun Kiyono, Nelson Enrique Yalta Soplin, Jun Suzuki, Kevin Duh, Shinji Watanabe - Show less +2 more

02 Nov 2019

4 citations

Journal Article•DOI•

Evaluating Translation Quality with Word Order Correlations

[...]

Tsutomu Hirao, Hideki Isozaki, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Nagata Masaaki - Show less +2 more

01 Jan 2014

4 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
…
27
28
29
30
31
32
33
…
34
35
36
37
38
39
40
41

Collapse

Cited by

PDF

Open Access

More filters

Automatic differentiation in PyTorch

[...]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Z. Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, Adam Lerer - Show less +6 more

28 Oct 2017

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less

Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

...read moreread less

13,268 citations

Posted Content•

PyTorch: An Imperative Style, High-Performance Deep Learning Library

[...]

Adam Paszke¹, Sam Gross², Francisco Massa², Adam Lerer², James Bradbury³, Gregory Chanan², Trevor Killeen⁴, Zeming Lin², Natalia Gimelshein⁵, Luca Antiga⁶, Alban Desmaison⁷, Andreas Kopf⁸, Edward Z. Yang², Zachary DeVito⁹, Martin Raison², Alykhan Tejani¹⁰, Sasank Chilamkurthy, Benoit Steiner², Lu Fang², Junjie Bai², Soumith Chintala² - Show less +17 more•Institutions (10)

University of Warsaw¹, Facebook², Salesforce.com³, University of Washington⁴, Nvidia⁵, Mario Negri Institute for Pharmacological Research⁶, University of Oxford⁷, ETH Zurich⁸, Stanford University⁹, Twitter¹⁰

03 Dec 2019-arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.

...read moreread less

12,767 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Proceedings Article•

Language Models are Few-Shot Learners

[...]

Tom B. Brown¹, Benjamin Mann, Nick Ryder², Melanie Subbiah, Jared Kaplan³, Prafulla Dhariwal¹, Arvind Neelakantan⁴, Pranav Shyam, Girish Sastry¹, Amanda Askell¹, Sandhini Agarwal¹, Ariel Herbert-Voss¹, Gretchen Krueger¹, Thomas Henighan¹, Rewon Child¹, Aditya Ramesh¹, Daniel M. Ziegler⁵, Jeffrey Wu¹, Clemens Winter, Christopher Hesse¹, Mark Chen¹, Eric Sigler, Mateusz Litwin, Scott Gray¹, Benjamin Chess¹, Jack Clark¹, Christopher Berner, Samuel McCandlish¹, Alec Radford¹, Ilya Sutskever¹, Dario Amodei¹ - Show less +27 more•Institutions (5)

OpenAI¹, University of California, Berkeley², Johns Hopkins University³, Google⁴, Massachusetts Institute of Technology⁵

28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

...read moreread less

10,132 citations

Proceedings Article•

PyTorch: An Imperative Style, High-Performance Deep Learning Library

[...]

Adam Paszke¹, Sam Gross², Francisco Massa², Adam Lerer², James Bradbury³, Gregory Chanan², Trevor Killeen⁴, Zeming Lin², Natalia Gimelshein⁵, Luca Antiga⁶, Alban Desmaison⁷, Andreas Kopf⁸, Edward Z. Yang², Zachary DeVito⁹, Martin Raison², Alykhan Tejani¹⁰, Sasank Chilamkurthy, Benoit Steiner², Lu Fang¹¹, Junjie Bai², Soumith Chintala² - Show less +17 more•Institutions (11)

01 Jan 2019

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several commonly used benchmarks.

...read moreread less

10,045 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse