Showing papers by "Jeffrey Dean published in 2010"
••
TL;DR: MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.
Abstract: MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.
1,293 citations
•
12 Jan 2010TL;DR: A large-scale data processing system and method for processing data in a distributed and parallel processing environment is described in this article, which includes an application-independent framework for processing the data having a plurality of applicationindependent map modules and reduce modules.
Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files The application-specific operators include: a map operator and a reduce operator The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data
87 citations
•
01 Oct 2010TL;DR: In this paper, a document and an initial score for the document, determining that there has been a decrease in a rate or quantity of new links that point to the document over time, classifying the document as stale in response to the determining, resulting in an updated score; and ranking the document with regard to at least one other document based on the score.
Abstract: A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranking the document with regard to at least one other document based, at least in part, on the score.
32 citations
••
10 Jun 201030 citations
•
12 Oct 2010TL;DR: A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception dates, and rank the document with regard to at least one other document based on the score as mentioned in this paper.
Abstract: A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception date, and rank the document with regard to at least one other document based, at least in part, on the score.
10 citations