scispace - formally typeset
Search or ask a question
Author

Alex Rudnick

Bio: Alex Rudnick is an academic researcher from Indiana University. The author has contributed to research in topics: Machine translation & Phrase. The author has an hindex of 6, co-authored 15 publications receiving 4837 citations. Previous affiliations of Alex Rudnick include Google & Georgia Institute of Technology.

Papers
More filters
Posted Content
TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

5,737 citations

Journal ArticleDOI
03 Oct 2016-PeerJ
TL;DR: The IUNI Observation on Social Media, an open analytics platform designed to facilitate computational social science, is presented, which leverages a historical, ongoing collection of over 70 billion public messages from Twitter.
Abstract: 1 The study of social phenomena is becoming increasingly reliant on big data from on2 line social networks. Broad access to social media data, however, requires software 3 development skills that not all researchers possess. Here we present the IUNI Observa4 tory on Social Media, an open analytics platform designed to facilitate computational 5 social science. The system leverages a historical, ongoing collection of over 70 billion 6 public messages from Twitter. We illustrate a number of interactive open-source tools 7 to retrieve, visualize, and analyze derived data from this collection. The Observatory, 8 now available at osome.iuni.iu.edu, is the result of a large, six-year collaborative effort 9 coordinated by the Indiana University Network Science Institute. 10 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2008v1 | CC-BY 4.0 Open Access | rec: 29 Apr 2016, publ: 29 Apr 2016

41 citations

Proceedings ArticleDOI
06 Apr 2008
TL;DR: By analyzing features of users' typing, Automatic Whiteout++ detects and corrects up to 32.37% of the errors made by typists while using a mini-QWERTY (RIM Blackberry style) keyboard.
Abstract: By analyzing features of users' typing, Automatic Whiteout++ detects and corrects up to 32.37% of the errors made by typists while using a mini-QWERTY (RIM Blackberry style) keyboard. The system targets "off-by-one" errors where the user accidentally presses a key adjacent to the one intended. Using a database of typing from longitudinal tests on two different keyboards in a variety of contexts, we show that the system generalizes well across users, model of keyboard, user expertise, and keyboard visibility conditions. Since a goal of Automatic Whiteout++ is to embed it in the firmware of mini-QWERTY keyboards, it does not rely on a dictionary. This feature enables the system to correct errors mid-word instead of applying a correction after the word has been typed. Though we do not use a dictionary, we do examine the effect of varying levels of language context in the system's ability to detect and correct erroneous keypresses.

31 citations

Posted Content
TL;DR: This paper describes the recent extensions to Truthy, a system for collecting and analyzing political discourse on Twitter, and introduces several new analytical perspectives on online discourse with the goal of facilitating collaboration between individuals in the computational and social sciences.
Abstract: The broad adoption of the web as a communication medium has made it possible to study social behavior at a new scale. With social media networks such as Twitter, we can collect large data sets of online discourse. Social science researchers and journalists, however, may not have tools available to make sense of large amounts of data or of the structure of large social networks. In this paper, we describe our recent extensions to Truthy, a system for collecting and analyzing political discourse on Twitter. We introduce several new analytical perspectives on online discourse with the goal of facilitating collaboration between individuals in the computational and social sciences. The design decisions described in this article are motivated by real-world use cases developed in collaboration with colleagues at the Indiana University School of Journalism. Author Keywords

20 citations

Proceedings Article
01 Jun 2013
TL;DR: The entries for the SemEval2013 cross-language word-sense disambiguation task (Lefever and Hoste, 2013) are presented, with three systems based on classifiers trained on local context features, with some elaborations.
Abstract: We present our entries for the SemEval2013 cross-language word-sense disambiguation task (Lefever and Hoste, 2013). We submitted three systems based on classifiers trained on local context features, with some elaborations. Our three systems, in increasing order of complexity, were: maximum entropy classifiers trained to predict the desired targetlanguage phrase using only monolingual features (we called this system L1); similar classifiers, but with the desired target-language phrase for the other four languages as features (L2); and lastly, networks of five classifiers, over which we do loopy belief propagation to solve the classification tasks jointly (MRF).

12 citations


Cited by
More filters
Posted Content
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

29,480 citations

Proceedings ArticleDOI
02 Nov 2016
TL;DR: TensorFlow as mentioned in this paper is a machine learning system that operates at large scale and in heterogeneous environments, using dataflow graphs to represent computation, shared state, and the operations that mutate that state.
Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

10,913 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: ResNeXt as discussed by the authors is a simple, highly modularized network architecture for image classification, which is constructed by repeating a building block that aggregates a set of transformations with the same topology.
Abstract: We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call cardinality (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online.

7,183 citations

Posted Content
TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

7,019 citations

Posted Content
TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

6,953 citations