TL;DR: A large-scale data processing system and method for processing data in a distributed and parallel processing environment is described in this article, which includes an application-independent framework for processing the data having a plurality of applicationindependent map modules and reduce modules.

...read moreread less

Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files The application-specific operators include: a map operator and a reduce operator The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data

...read moreread less

87 citations

Patent•

Document scoring based on link-based criteria

[...]

Anurag Acharya¹, Matt Cutts¹, Jeffrey Dean¹, Paul Haahr¹, Monika Henzinger¹, Steve Lawrence¹, Karl Pfleger¹, Simon Tong¹ - Show less +4 more•Institutions (1)

Google¹

01 Oct 2010

TL;DR: In this paper, a document and an initial score for the document, determining that there has been a decrease in a rate or quantity of new links that point to the document over time, classifying the document as stale in response to the determining, resulting in an updated score; and ranking the document with regard to at least one other document based on the score.

...read moreread less

Abstract: A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranking the document with regard to at least one other document based, at least in part, on the score.

...read moreread less

32 citations

Proceedings Article•DOI•

Evolution and future directions of large-scale storage and computation systems at Google

[...]

Jeffrey Dean¹•Institutions (1)

Google¹

10 Jun 2010

30 citations

Patent•

Document scoring based on document inception date

[...]

Matt Cutts¹, Jeffrey Dean¹, Paul Haahr¹, Monika Henzinger¹, Steve Lawrence¹, Karl Pfleger¹, Simon Tong¹ - Show less +3 more•Institutions (1)

Google¹

12 Oct 2010

TL;DR: A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception dates, and rank the document with regard to at least one other document based on the score as mentioned in this paper.

...read moreread less

Abstract: A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception date, and rank the document with regard to at least one other document based, at least in part, on the score.

...read moreread less

10 citations

MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.

[...]

Jeffrey Dean, Sanjay Ghemawat

01 Jan 2010

3 citations