scispace - formally typeset
Search or ask a question

Showing papers by "Jeffrey Dean published in 2016"


Proceedings ArticleDOI
02 Nov 2016
TL;DR: TensorFlow as mentioned in this paper is a machine learning system that operates at large scale and in heterogeneous environments, using dataflow graphs to represent computation, shared state, and the operations that mutate that state.
Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

10,913 citations


Posted Content
TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

5,737 citations


Posted Content
TL;DR: The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

5,542 citations


Posted Content
TL;DR: The authors propose to add an artificial token at the beginning of the input sentence to specify the required target language, which improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant.
Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. Our method often improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant. On the WMT'14 benchmarks, a single multilingual model achieves comparable performance for English$\rightarrow$French and surpasses state-of-the-art results for English$\rightarrow$German. Similarly, a single multilingual model surpasses state-of-the-art results for French$\rightarrow$English and German$\rightarrow$English on WMT'14 and WMT'15 benchmarks respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

947 citations


Journal ArticleDOI
TL;DR: To make the most of the enormous opportunities at hand will require focusing on five research areas, according to database researchers, who paint big data as a defining challenge.
Abstract: Database researchers paint big data as a defining challenge. To make the most of the enormous opportunities at hand will require focusing on five research areas.

77 citations


Patent
28 Oct 2016
TL;DR: In this paper, the computational graph is partitioned into a plurality of sub-graphs, and the operations represented by the one or more nodes in the subgraph are assigned to a respective available device in the plurality of available devices for operation.
Abstract: To provide a technology of processing computational graphs expressing a neutral network.SOLUTION: Methods include: receiving a request from a client to process a computational graph; and obtaining data representing the computational graph. The computational graph includes a plurality of nodes and directed edges. Each node represents a respective operation. Each directed edge connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node. Further, a plurality of available devices for performing the requested operation are identified and the computational graph is partitioned into a plurality of sub-graphs. For each sub-graph, the operations represented by the one or more nodes in the sub-graph are assigned to a respective available device in the plurality of available devices for operation.SELECTED DRAWING: Figure 2

24 citations


Proceedings ArticleDOI
Jeffrey Dean1
08 Feb 2016
TL;DR: Some of the design decisions made in building TensorFlow are highlighted, research results produced within the group are discussed, and ways in which these ideas have been applied to a variety of problems in Google's products are described.
Abstract: For the past five years, the Google Brain team has focused on conducting research in difficult problems in artificial intelligence, on building large-scale computer systems for machine learning research, and, in collaboration with many teams at Google, on applying our research and systems to dozens of Google products. Our group has recently open-sourced the TensorFlow system (tensorflow.org), a system designed to easily express machine ideas, and to quickly train, evaluate and deploy machine learning systems. In this talk, I'll highlight some of the design decisions we made in building TensorFlow, discuss research results produced within our group, and describe ways in which these ideas have been applied to a variety of problems in Google's products, usually in close collaboration with other teams. This talk describes joint work with many people at Google.

20 citations



Patent
Jeffrey Dean1
27 Sep 2016

3 citations


Proceedings ArticleDOI
Jeffrey Dean1
14 Jun 2016
TL;DR: This talk will highlight some of the advances that have been made in deep learning and suggest some interesting directions for future research in data management.
Abstract: Over the past five years, deep learning and large-scale neural networks have made significant advances in speech recognition, computer vision, language understanding and translation, robotics, and many other fields. Deep learning allows the use of very raw forms of data in order to build higher-level understanding of data automatically, and can also be used to learn to accomplish complex tasks. In the next decade, it is likely that a fruitful direction for research in data management will be in how to seamlessly integrate these kinds of machine learning models into systems that store and manage data. In this talk, I will highlight some of the advances that have been made in deep learning and suggest some interesting directions for future research.

2 citations


Patent
Yuan Yu1, Jeffrey Dean1
08 Nov 2016
TL;DR: In this article, the authors present a method for processing loops in computational graphs representing machine learning models, where data identifying an allocation of the computational graph across devices is obtained, and a structure of nodes and edges that represents an operation that provides a current state of recursion or iteration in the respective control flow statement is generated.
Abstract: Systems and methods for processing loops in computational graphs representing machine learning models are disclosed. An example method begins with obtaining data representing a computational graph. Data identifying an allocation of the computational graph across devices is obtained. Additionally, one or more nodes in the computational graph that represent a respective control flow statement are identified. For each identified node, a structure of nodes and edges that represents an operation that provides a current state of recursion or iteration in the respective control flow statement is generated. This structure is inserted into the computational graph and the allocation of nodes to devices is modified to assign the structure to a device.