scispace - formally typeset
Search or ask a question

Showing papers by "Greg S. Corrado published in 2014"


Proceedings Article
01 Jan 2014
TL;DR: A simple method for constructing an image embedding system from any existing image classifier and a semantic word embedding model, which contains the $ $ class labels in its vocabulary is proposed, which outperforms state of the art methods on the ImageNet zero-shot learning task.
Abstract: Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of these image embedding systems have stressed their advantages over the traditional way{} classification framing of image understanding, particularly in terms of the promise for zero-shot learning -- the ability to correctly annotate images of previously unseen object categories. In this paper, we propose a simple method for constructing an image embedding system from any existing way{} image classifier and a semantic word embedding model, which contains the $ $ class labels in its vocabulary. Our method maps images into the semantic embedding space via convex combination of the class label embedding vectors, and requires no additional training. We show that this simple and direct method confers many of the advantages associated with more complex image embedding schemes, and indeed outperforms state of the art methods on the ImageNet zero-shot learning task.

853 citations


Posted Content
TL;DR: It is shown that bilingual embeddings learned using the proposed BilBOWA model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data.
Abstract: We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-aligned data. This is achieved using a novel sampled bag-of-words cross-lingual objective, which is used to regularize two noise-contrastive language models for efficient cross-lingual feature learning. We show that bilingual embeddings learned using the proposed model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data.

335 citations


Patent
Kai Chen1, Xiaodan Song1, Greg S. Corrado1, Kun Zhang1, Jeffrey Dean1, Bahman Rabii1 
13 Mar 2014
TL;DR: In this article, a set of relevance scores for each concept term in a pre-determined set of concept terms is calculated, where each of the respective relevance scores measures a predicted relevance of the corresponding concept term to the resource.
Abstract: 16113-4691WO1 Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scoring concept terms using a deep network. One of the methods includes receiving an input comprising a plurality of features of a resource, 5 wherein each feature is a value of a respective attribute of the resource; processing each of the features using a respective embedding function to generate one or more numeric values; processing the numeric values to generate an alternative representation of the features of the resource, wherein processing the numeric values comprises applying one or more non-linear transformations to the numeric values; and processing the alternative 10 representation of the input to generate a respective relevance score for each concept term in a pre-determined set of concept terms, wherein each of the respective relevance scores measures a predicted relevance of the corresponding concept term to the resource.

25 citations


Patent
19 Dec 2014

17 citations


Patent
Arthur Asuncion1, Johannes Christian Schuler1, Greg S. Corrado1, Kai Chen1, Yong Sheng1 
14 Jan 2014
TL;DR: In this article, a data processing system can obtain data identifying a global cluster that indicates an interest category and can create a sub-cluster of the global cluster based on a characteristic common to content access computing devices.
Abstract: Systems and methods of determining computing device characteristics from computer network activity are provided. A data processing system can obtain data identifying a global cluster that indicates an interest category and can create a sub-cluster of the global cluster based on a characteristic common to content access computing devices. A weight indicating a correlation between the characteristic common to content access computing devices and the interest category can be assigned to the sub-cluster. Responsive to a communication between a first content access computing device and a content publisher computing device, the data processing system can identify a characteristic. The data processing system can associate the first content access computing device with the sub-cluster based on the characteristic of the first content access computing device and the characteristic common to the content access computing devices, and based on the weight can determine a status of the first content access computing device.

1 citations