Home
/
Authors
/
Alex Rudnick

Author

Alex Rudnick

Other affiliations: Google, Georgia Institute of Technology

Bio: Alex Rudnick is an academic researcher from Indiana University. The author has contributed to research in topics: Machine translation & Phrase. The author has an hindex of 6, co-authored 15 publications receiving 4837 citations. Previous affiliations of Alex Rudnick include Google & Georgia Institute of Technology.

Topics: Machine translation, Phrase, Language model, Dependency grammar, SemEval ...read more

Papers

PDF

Open Access

More filters

Posted Content•

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

[...]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason A. Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg S. Corrado, Macduff Hughes, Jeffrey Dean - Show less +27 more

26 Sep 2016-arXiv: Computation and Language

TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.

...read moreread less

Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

...read moreread less

5,737 citations

Journal Article•DOI•

OSoMe: the IUNI observatory on social media

[...]

Clayton A. Davis¹, Giovanni Luca Ciampaglia¹, Luca Maria Aiello², Keychul Chung¹, Michael Conover³, Emilio Ferrara⁴, Alessandro Flammini¹, Geoffrey C. Fox¹, Xiaoming Gao⁵, Bruno Gonçalves⁶, Przemyslaw A. Grabowicz⁷, Kibeom Hong¹, Pik-Mai Hui¹, Scott McCaulay¹, Karissa McKelvey, Mark R. Meiss⁸, Snehal Patil⁹, Chathuri Peli Kankanamalage¹, Valentin Pentchev¹, Judy Qiu¹, Jacob Ratkiewicz⁸, Alex Rudnick⁸, Benjamin D. Serrette¹, Prashant Shiralkar¹, Onur Varol¹, Lilian Weng, Tak-Lon Wu¹⁰, Andrew J. Younge¹, Filippo Menczer¹ - Show less +25 more•Institutions (10)

Indiana University¹, Bell Labs², LinkedIn³, University of Southern California⁴, Facebook⁵, New York University⁶, Max Planck Society⁷, Google⁸, Yahoo!⁹, Amazon.com¹⁰

03 Oct 2016-PeerJ

TL;DR: The IUNI Observation on Social Media, an open analytics platform designed to facilitate computational social science, is presented, which leverages a historical, ongoing collection of over 70 billion public messages from Twitter.

...read moreread less

Abstract: 1 The study of social phenomena is becoming increasingly reliant on big data from on2 line social networks. Broad access to social media data, however, requires software 3 development skills that not all researchers possess. Here we present the IUNI Observa4 tory on Social Media, an open analytics platform designed to facilitate computational 5 social science. The system leverages a historical, ongoing collection of over 70 billion 6 public messages from Twitter. We illustrate a number of interactive open-source tools 7 to retrieve, visualize, and analyze derived data from this collection. The Observatory, 8 now available at osome.iuni.iu.edu, is the result of a large, six-year collaborative effort 9 coordinated by the Indiana University Network Science Institute. 10 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2008v1 | CC-BY 4.0 Open Access | rec: 29 Apr 2016, publ: 29 Apr 2016

...read moreread less

41 citations

Proceedings Article•DOI•

Automatic whiteout++: correcting mini-QWERTY typing errors using keypress timing

[...]

James Clawson¹, Kent Lyons², Alex Rudnick¹, Robert A. Iannucci¹, Thad Starner¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Intel²

06 Apr 2008

TL;DR: By analyzing features of users' typing, Automatic Whiteout++ detects and corrects up to 32.37% of the errors made by typists while using a mini-QWERTY (RIM Blackberry style) keyboard.

...read moreread less

Abstract: By analyzing features of users' typing, Automatic Whiteout++ detects and corrects up to 32.37% of the errors made by typists while using a mini-QWERTY (RIM Blackberry style) keyboard. The system targets "off-by-one" errors where the user accidentally presses a key adjacent to the one intended. Using a database of typing from longitudinal tests on two different keyboards in a variety of contexts, we show that the system generalizes well across users, model of keyboard, user expertise, and keyboard visibility conditions. Since a goal of Automatic Whiteout++ is to embed it in the firmware of mini-QWERTY keyboards, it does not rely on a dictionary. This feature enables the system to correct errors mid-word instead of applying a correction after the word has been typed. Though we do not use a dictionary, we do examine the effect of varying levels of language context in the system's ability to detect and correct erroneous keypresses.

...read moreread less

31 citations

Posted Content•

Visualizing Communication on Social Media Making Big Data Accessible

[...]

Karissa McKelvey, Alex Rudnick, Michael Conover, Filippo Menczer¹•Institutions (1)

Indiana University¹

07 Feb 2012-arXiv: Social and Information Networks

TL;DR: This paper describes the recent extensions to Truthy, a system for collecting and analyzing political discourse on Twitter, and introduces several new analytical perspectives on online discourse with the goal of facilitating collaboration between individuals in the computational and social sciences.

...read moreread less

Abstract: The broad adoption of the web as a communication medium has made it possible to study social behavior at a new scale. With social media networks such as Twitter, we can collect large data sets of online discourse. Social science researchers and journalists, however, may not have tools available to make sense of large amounts of data or of the structure of large social networks. In this paper, we describe our recent extensions to Truthy, a system for collecting and analyzing political discourse on Twitter. We introduce several new analytical perspectives on online discourse with the goal of facilitating collaboration between individuals in the computational and social sciences. The design decisions described in this article are motivated by real-world use cases developed in collaboration with colleagues at the Indiana University School of Journalism. Author Keywords

...read moreread less

20 citations

Proceedings Article•

HLTDI: CL-WSD Using Markov Random Fields for SemEval-2013 Task 10

[...]

Alex Rudnick¹, Can Liu¹, Michael Gasser¹•Institutions (1)

Indiana University¹

01 Jun 2013

TL;DR: The entries for the SemEval2013 cross-language word-sense disambiguation task (Lefever and Hoste, 2013) are presented, with three systems based on classifiers trained on local context features, with some elaborations.

...read moreread less

Abstract: We present our entries for the SemEval2013 cross-language word-sense disambiguation task (Lefever and Hoste, 2013). We submitted three systems based on classifiers trained on local context features, with some elaborations. Our three systems, in increasing order of complexity, were: maximum entropy classifiers trained to predict the desired targetlanguage phrase using only monolingual features (we called this system L1); similar classifiers, but with the desired target-language phrase for the other four languages as features (L2); and lastly, networks of five classifiers, over which we do loopy belief propagation to solve the classification tasks jointly (MRF).

...read moreread less

12 citations

Cited by

PDF

Open Access

More filters

Posted Content•

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

[...]

Jacob Devlin¹, Ming-Wei Chang¹, Kenton Lee¹, Kristina Toutanova¹•Institutions (1)

Google¹

11 Oct 2018-arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

...read moreread less

29,480 citations

Proceedings Article•DOI•

TensorFlow: a system for large-scale machine learning

[...]

Martín Abadi¹, Paul Barham¹, Jianmin Chen¹, Zhifeng Chen¹, Andy Davis¹, Jeffrey Dean¹, Matthieu Devin¹, Sanjay Ghemawat¹, Geoffrey Irving¹, Michael Isard¹, Manjunath Kudlur¹, Josh Levenberg¹, Rajat Monga¹, Sherry Moore¹, Derek G. Murray¹, Benoit Steiner¹, Paul A. Tucker¹, Vijay K. Vasudevan¹, Pete Warden¹, Martin Wicke¹, Yuan Yu¹, Xiaoqiang Zheng¹ - Show less +18 more•Institutions (1)

Google¹

02 Nov 2016

TL;DR: TensorFlow as mentioned in this paper is a machine learning system that operates at large scale and in heterogeneous environments, using dataflow graphs to represent computation, shared state, and the operations that mutate that state.

...read moreread less

Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

...read moreread less

10,913 citations

Proceedings Article•DOI•

Aggregated Residual Transformations for Deep Neural Networks

[...]

Saining Xie¹, Ross Girshick², Piotr Dollár², Zhuowen Tu¹, Kaiming He² - Show less +1 more•Institutions (2)

University of California, San Diego¹, Facebook²

21 Jul 2017

TL;DR: ResNeXt as discussed by the authors is a simple, highly modularized network architecture for image classification, which is constructed by repeating a building block that aggregates a set of transformations with the same topology.

...read moreread less

Abstract: We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call cardinality (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online.

...read moreread less

7,183 citations

Posted Content•

Attention Is All You Need

[...]

Ashish Vaswani¹, Noam Shazeer¹, Niki Parmar², Jakob Uszkoreit¹, Llion Jones¹, Aidan N. Gomez¹, Lukasz Kaiser¹, Illia Polosukhin¹ - Show less +4 more•Institutions (2)

Google¹, University of Southern California²

12 Jun 2017-arXiv: Computation and Language

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

7,019 citations

Posted Content•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[...]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

23 Oct 2019-arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

6,953 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse