scispace - formally typeset
Search or ask a question

Showing papers by "Jeffrey Dean published in 2013"


Proceedings Article
Tomas Mikolov1, Ilya Sutskever1, Kai Chen1, Greg S. Corrado1, Jeffrey Dean1 
05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

24,012 citations


Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations


Posted Content
Tomas Mikolov1, Ilya Sutskever1, Kai Chen1, Greg S. Corrado1, Jeffrey Dean1 
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

11,343 citations


Proceedings Article
16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

9,270 citations


Proceedings Article
05 Dec 2013
TL;DR: This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training.
Abstract: Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources - such as text data - both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions achieving hit rates of up to 18% across thousands of novel labels never seen by the visual model.

2,461 citations


Journal ArticleDOI
Jeffrey Dean1, Luiz Andre Barroso1
TL;DR: Software techniques that tolerate latency variability are vital to building responsive large-scale Web services.
Abstract: Software techniques that tolerate latency variability are vital to building responsive large-scale Web services.

1,613 citations


Proceedings ArticleDOI
26 May 2013
TL;DR: This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units.
Abstract: Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function In this work, we show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units These units are linear when their input is positive and zero otherwise In a supervised setting, we can successfully train very deep nets from random initialization on a large vocabulary speech recognition task achieving lower word error rates than using a logistic network with the same topology Similarly in an unsupervised setting, we show how we can learn sparse features that can be useful for discriminative tasks All our experiments are executed in a distributed environment using several hundred machines and several hundred hours of speech data

541 citations


Journal ArticleDOI
TL;DR: Spanner as mentioned in this paper is Google's scalable, multiversion, globally distributed, and synchronously replicated database, which is the first system to distribute data at global scale and support externally-consistent distributed transactions.
Abstract: Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes, across all of Spanner.

493 citations


Proceedings ArticleDOI
26 May 2013
TL;DR: Experimental results for cross- and multi-lingual network training of eleven Romance languages on 10k hours of data in total show average relative gains over the monolingual baselines, but additional gain from jointly training the languages on all data comes at an increased training time of roughly four weeks.
Abstract: Today's speech recognition technology is mature enough to be useful for many practical applications. In this context, it is of paramount importance to train accurate acoustic models for many languages within given resource constraints such as data, processing power, and time. Multilingual training has the potential to solve the data issue and close the performance gap between resource-rich and resource-scarce languages. Neural networks lend themselves naturally to parameter sharing across languages, and distributed implementations have made it feasible to train large networks. In this paper, we present experimental results for cross- and multi-lingual network training of eleven Romance languages on 10k hours of data in total. The average relative gains over the monolingual baselines are 4%/2% (data-scarce/data-rich languages) for cross- and 7%/2% for multi-lingual training. However, the additional gain from jointly training the languages on all data comes at an increased training time of roughly four weeks, compared to two weeks (monolingual) and one week (crosslingual).

330 citations


Posted Content
TL;DR: In this paper, a convex combination of the class label embedding vectors is used to map images into the semantic embedding space via convex combinations of the embeddings, which requires no additional training.
Abstract: Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of these image embedding systems have stressed their advantages over the traditional way{} classification framing of image understanding, particularly in terms of the promise for zero-shot learning -- the ability to correctly annotate images of previously unseen object categories. In this paper, we propose a simple method for constructing an image embedding system from any existing way{} image classifier and a semantic word embedding model, which contains the $ $ class labels in its vocabulary. Our method maps images into the semantic embedding space via convex combination of the class label embedding vectors, and requires no additional training. We show that this simple and direct method confers many of the advantages associated with more complex image embedding schemes, and indeed outperforms state of the art methods on the ImageNet zero-shot learning task.

192 citations



Patent
Greg S. Corrado1, Kai Chen1, Jeffrey Dean1, Samy Bengio1, Rajat Monga1, Matthieu Devin1 
15 Aug 2013


Patent
21 May 2013
TL;DR: In this article, a distributed storage system is provided, which includes multiple front-end servers and zones for managing data for clients, and data associated with different accounts may be replicated within the distributed storage systems using different data replication policies.
Abstract: A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client.

Posted Content
TL;DR: This work considers a simple approach to encapsulate common sense knowledge using co-occurrence statistics from web documents, and observes significant improvements in recognition and localization rates for both ImageNet Detection 2012 and Sun 2012 datasets.
Abstract: Object recognition and localization are important tasks in computer vision. The focus of this work is the incorporation of contextual information in order to improve object recognition and localization. For instance, it is natural to expect not to see an elephant to appear in the middle of an ocean. We consider a simple approach to encapsulate such common sense knowledge using co-occurrence statistics from web documents. By merely counting the number of times nouns (such as elephants, sharks, oceans, etc.) co-occur in web documents, we obtain a good estimate of expected co-occurrences in visual data. We then cast the problem of combining textual co-occurrence statistics with the predictions of image-based classifiers as an optimization problem. The resulting optimization problem serves as a surrogate for our inference procedure. Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy. Concretely, we observe significant improvements in recognition and localization rates for both ImageNet Detection 2012 and Sun 2012 datasets.

Patent
15 Apr 2013
TL;DR: In this paper, a multilingual deep neural network (DNN) acoustic model may be processed based on the training data and weights associated with the multiple layers of one or more nodes of the processed multilingual DNN acoustic model can be stored in a database.
Abstract: Methods and systems for processing multilingual DNN acoustic models are described. An example method may include receiving training data that includes a respective training data set for each of two or more or languages. A multilingual deep neural network (DNN) acoustic model may be processed based on the training data. The multilingual DNN acoustic model may include a feedforward neural network having multiple layers of one or more nodes. Each node of a given layer may connect with a respective weight to each node of a subsequent layer, and the multiple layers of one or more nodes may include one or more shared hidden layers of nodes and a language-specific output layer of nodes corresponding to each of the two or more languages. Additionally, weights associated with the multiple layers of one or more nodes of the processed multilingual DNN acoustic model may be stored in a database.

Patent
04 Jun 2013
TL;DR: In this paper, a method of storing data on a data storage server having one or more processors and memory storing one or several programs for execution by the one or multiple processors is described.
Abstract: A method of storing data is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The data storage server receives a first and second data request, the requests including a first and second range of one or more keys and an associated first and second value respectively. The data storage server identifies one or more overlap points associated with the first range and the second range. For each of the overlap points, the data storage server then creates data items including ranges of keys, the ranges of each data item including one or more keys that are either: (a) the keys between a terminal key of the first or second range and the overlap point, or (b) the keys between two adjacent overlap points.


Patent
Samy Bengio1, Jeffrey Dean1, Quoc V. Le1, Jonathon Shlens1, Yoram Singer1 
20 Dec 2013
TL;DR: In this article, an option may have an option score associated with an associated relation score, and a relation score may be calculated for a first option and a second option corresponding to a second object in an image.
Abstract: Systems and techniques are disclosed for labeling objects within an image. The objects may be labeled by selecting an option from a plurality of options such that each option is a potential label for the object. An option may have an option score associated with. Additionally, a relation score may be calculated for a first option and a second option corresponding to a second object in an image. The relation score may be based on a frequency, probability, or observance corresponding to the co-occurrence of text associated with the first option and the second option in a text corpus such as the World Wide Web. An option may be selected as a label for an object based on a global score calculated based at least on an option score and relation score associated with the option.

Patent
Emanuel Taropa1, Jeffrey Dean1, Simon Tong1, Nitin Gupta1, Noam Shazeer1, Anwis Das1, Murtaza A. Basrai1 
27 Jun 2013

Patent
04 Jun 2013
TL;DR: In this article, a method for deleting obsolete files from a file system is proposed, where the file system includes at least one reference file whose file name matches the file name of the target file.
Abstract: A method for deleting obsolete files from a file system is provided. The method includes: receiving a request to delete a reference to a target file in a file system from a file reference data structure, wherein the file reference data structure includes target file names and reference file names; identifying a reference file name in the file reference data structure, wherein the reference file name includes a file name of the target file; deleting a reference file from the file system, wherein the reference file has the identified reference file name; checking whether the file system includes at least one reference file whose file name matches the file name of the target file; if there is no such reference file in the file system: deleting the target file from the file system; and deleting the file name of the target file from the file reference data structure.