scispace - formally typeset
Search or ask a question

Showing papers by "Jeffrey Dean published in 2007"


Proceedings Article
Thorsten Brants1, Ashok C. Popat1, Peng Xu1, Franz Josef Och1, Jeffrey Dean1 
22 Jun 2007
TL;DR: Systems, methods, and computer program products for machine translation are provided for backoff score determination as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.
Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

660 citations


Patent
16 Feb 2007
TL;DR: In this article, the authors present systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
Abstract: Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications

150 citations


Patent
08 May 2007
TL;DR: A computer-implemented method for calculating first statistics about a user-identified event within a first subset of a database of events, selecting a second subset of the database based on the first statistics, and merging the first and second statistics as statistics of the user identified event within the entire database as discussed by the authors.
Abstract: A computer-implemented method includes calculating first statistics about a user-identified event within a first subset of a database of events; selecting a second subset of the database of events based on said first statistics; calculating second statistics about the user-identified event within the second subset of the database of events; merging the first and second statistics as statistics of the user-identified event within the entire database of events; and generating a result including at least a portion of the merged statistics of the user-identified event.

54 citations


01 Jan 2007
TL;DR: MapReduce was developed as a way of simplifying the development of large-scale computations at Google and allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Abstract: THIS CHAPTER DESCRIBES THE DESIGN AND IMPLEMENTATION OF MAPREDUCE, a programming system for large-scale data processing problems. MapReduce was developed as a way of simplifying the development of large-scale computations at Google. MapReduce programs are automatically parallelized and executed on a large cluster of commodity machines. The runtime system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required intermachine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

26 citations


01 Jan 2007
TL;DR: This work applies dynamic profile information to determine the dynamic execution frequency distributions of the classes of receivers at call sites and shows that these distributions are heavily skewed towards the most commonly occurring receiver class across several different languages.
Abstract: Dynamic binding slows down object-oriented programs. Dynamic dispatch mechanisms which work well where all receiver classes are equally likely are too pessimistic because at most call sites one receiver class predominates. We apply dynamic profile information to determine the dynamic execution frequency distributions of the classes of receivers at call sites. We show that these distributions are heavily skewed towards the most commonly occurring receiver class across several different languages. Moreover, we show that the distributions are stable across program inputs, from one version of a program to another, and even to some extent across programs that share library code. Finally, we demonstrate that significant run-time performance improvements for object-oriented programs can be gained by exploiting the information contained in dynamic receiver class distributions in a relatively simple optimizing compiler.

10 citations


Proceedings Article
Jeffrey Dean1
01 Jan 2007

7 citations


Patent
09 Jan 2007
TL;DR: In this paper, the problem of providing a search result of highest relevance, that is, a high quality search result to a user by a search engine, in response to a search query by the user, is addressed.
Abstract: PROBLEM TO BE SOLVED: To provide a search result of highest relevance, that is a high quality search result to a user by a search engine, in response to a search query by the user. SOLUTION: A system (125) and a method are obtained, which specify a document, acquire one or more history data associated with the document, and generate a score for the document based on at least a part of one or more history data. COPYRIGHT: (C)2007,JPO&INPIT

2 citations


Patent
Simon Tong1, Jeffrey Dean1
26 Dec 2007
TL;DR: In this paper, a system automatically creates a list from items in existing lists and assigns weights to the items in the existing lists based on the one or more example items corresponding to the list.
Abstract: A system automatically creates a list from items in existing lists. The system receives one or more example items corresponding to the list and assigns weights to the items in the existing lists based on the one or more example items. The system then forms the list based on the items and the weights assigned to the items.

1 citations



Patent
16 Feb 2007
TL;DR: This article proposed a method of using large language models in machine translation in which a translation model is partioned into a plurality of language model partitions stored on a pluraility of different language model servers.
Abstract: A method of using large language models in machine translation in which a translation model is partioned into a plurality of language model partitions stored on a pluraility of different language model servers. Segments of text are distributed to the servers for translation according to server workload.

Patent
16 Feb 2007
TL;DR: In this paper, the authors present systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
Abstract: Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.