Search or ask a question

Showing papers by "Jeffrey Dean published in 2008"

PDF

Open Access

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2008-Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

...read moreread less

17,663 citations

Journal Article•DOI•

Bigtable: A Distributed Storage System for Structured Data

[...]

Fay W. Chang¹, Jeffrey Dean¹, Sanjay Ghemawat¹, Wilson C. Hsieh¹, Deborah A. Wallach¹, Michael Burrows¹, Tushar Deepak Chandra¹, Andrew Fikes¹, Robert E. Gruber¹ - Show less +5 more•Institutions (1)

Google¹

01 Jun 2008-ACM Transactions on Computer Systems

TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.

...read moreread less

Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

...read moreread less

3,259 citations

Patent•

Associating Application-Specific Methods with Tables Used for Data Storage

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹, Andrew Fikes¹, Yasushi Saito¹•Institutions (1)

Google¹

08 Oct 2008

TL;DR: In this paper, a method of accessing data includes storing a table that includes a plurality of tablets corresponding to distinct nonoverlapping table portions, and a distinct application object and distinct tablet are associated with each tablet access object.

...read moreread less

Abstract: A method of accessing data includes storing a table that includes a plurality of tablets corresponding to distinct non-overlapping table portions. Respective pluralities of tablet access objects and application objects are stored in a plurality of servers. A distinct application object and distinct tablet are associated with each tablet access object. Each application object corresponds to a distinct instantiation of an application associated with the table. The tablet access objects and associated application objects are redistributed among the servers in accordance with a first load-balancing criterion. A first request directed to a respective tablet is received from a client. In response, the tablet access object associated with the respective tablet is used to perform a data access operation on the respective tablet, and the application object associated with the respective tablet is used to perform an additional computational operation to produce a result to be returned to the client.

...read moreread less

2 citations