What are the contributions in "Evaluating mapreduce for multi-core and multiprocessor systems" ?

This paper evaluates the suitability of the MapReduce model for multi-core and multi-processor systems. The authors describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The authors study Phoenix with multi-core and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. Overall, the authors establish that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.

What is the effect of the work on the heaps?

As work is distributed across more cores, the heaps accessed by each core are smaller and operations on them become significantly faster.

What are the main factors that affect their acceptance?

Apart from ease-of-use and scalability, two factors that may affect their acceptance is how well they run on existing hardware and if they can tolerate errors.

What is the way to test the mapreduce model?

Certain applications, such as WordCount and ReverseIndex, fit well with the MapReduce model and lead to very compact and simple Phoenix code.

How many cores do you currently spawn?

Number of Cores and Workers/Core: Since MapReduce programs are data-intensive, the authors currently spawn workers to all available cores.

What is the important conclusion from Figure 6?

The conclusion from Figure 6 is that, given an efficient implementation, MapReduce is an attractive model for some classes of computation.

What is the way to schedule a system?

In general, there are three scheduling approaches one can employ: 1) use a default policy for the specific system which has been developed taking into account its characteristics; 2) dynamically determine the best policy for each decision by monitoring resource availability and runtime behavior; 3) allow the programmer to provide application specific policies.

What is the way to scale the system load?

In a multi-programming environment, the scheduler can periodically check the system load and scale its usage based on system-wide priorities.

What is the function that can be used to trigger a prefetch engine?

The runtime can trigger a prefetch engine that brings the data for the next task to the L2 cache in parallel with processing the current task.

What is the way to monitor the performance of a data intensive program?

At the beginning of a data intensive program, the runtime can vary the unit size and monitor the trends in the completion time or other performance indicators (processor utilization, number of misses, etc.) in order to select the best possible value.

What is the algorithm for calculating the sum of squares?

The algorithm assigns different portions of the file to different map tasks, which compute certain summary statistics like the sum of squares.

What is the key-based structure that MapReduce uses?

The key-based structure that MapReduce uses fits well the algorithm of WordCount, MatrixMultiply, StringMatch, and LinearRegression.

What is the algorithm for calculating the frequency of component occurences in a bitmap?

The algorithm assigns different portions of the image to different Map tasks, which parse the image and insert the frequency of component occurences into arrays.

(Open Access) Evaluating MapReduce for Multi-core and Multiprocessor Systems (2007) | C. Ranger

Q: What is the importance of the mapreduce model?

The MapReduce model requires that data is associated with keys and that pairs are handled in a specific manner at each execution step.

Q: What is the function that can be used to prefetch input for a map?

A node can also prefetch the input for its next Map or Reduce task while processing the current one, which is similar to the double-buffering schemes used in streaming models [23].

Q: What is the function that sums the counts for each unique word?

The reduce function sums the counts for each unique word.// input: a document // intermediate output: key=word; value=1Map(void *input) { for each word w in inputEmitIntermediate(w, 1); }// intermediate output: key=word; value=1 // output: key=word; value=occurencesReduce(String key, Iterator values) { int result = 0; for each v in values result += v; Emit(w, result);

Q: How does the Phoenix implementation handle permanent and transient faults?

Through fault injection experiments, the authors show that Phoenix can handle permanent and transient faults during Map and Reduce tasks at a small performance penalty.

Evaluating MapReduce for Multi-core and Multiprocessor Systems

Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis

∗

Computer Systems Laboratory

Stanford University

Abstract

This paper evaluates the suitability of the MapReduce

model for multi-core and multi-processor systems. MapRe-

duce was created by Google for application development

on data-centers with thousands of servers. It allows pro-

grammers to write functional-style code that is automati-

cally parallelized and scheduled in a distributed system.

We describe Phoenix, an implementation of MapReduce

for shared-memory systems that includes a programming

API and an efﬁcient runtime system. The Phoenix run-

time automatically manages thread creation, dynamic task

scheduling, data partitioning, and fault tolerance across

processor nodes. We study Phoenix with multi-core and

symmetric multiprocessor systems and evaluate its perfor-

mance potential and error recovery features. We also com-

pare MapReduce code to code written in lower-level APIs

such as P-threads. Overall, we establish that, given a care-

ful implementation, MapReduce is a promising model for

scalable performance on shared-memory systems with sim-

ple parallel code.

1 Introduction

As multi-core chips become ubiquitous, we need parallel

programs that can exploit more than one processor. Tradi-

tional parallel programming techniques, such as message-

passing and shared-memory threads, are too cumbersome

for most developers. They require that the programmer

manages concurrency explicitly by creating threads and

synchronizing them through messages or locks. They also

require manual management of data locality. Hence, it is

very difﬁcult to write correct and scalable parallel code for

non-trivial algorithms. Moreover, the programmer must of-

ten re-tune the code when the application is ported to a dif-

ferent or larger-scale system.

To simplify parallel coding, we need to develop two com-

ponents: a practical programming model that allows users

to specify concurrency and locality at a high level and an

∗

Email addresses: {cranger, ramananr, penmetsa}@stanford.edu,

garybradski@gmail.com, and christos@ee.stanford.edu.

efﬁcient runtime system that handles low-level mapping, re-

source management, and fault tolerance issues automati-

cally regardless of the system characteristics or scale. Nat-

urally, the two components are closely linked. Recently,

there has been a signiﬁcant body of research towards these

goals using approaches such as streaming [13, 15], mem-

ory transactions [14, 5], data-ﬂow based schemes [2], asyn-

chronous parallelism, and partitioned global address space

languages [6, 1, 7].

This paper presents Phoenix, a programming API and

runtime system based on Google’s MapReduce model [8].

MapReduce borrows two concepts from functional lan-

guages to express data-intensive algorithms. The Map func-

tion processes the input data and generates a set of interme-

diate key/value pairs. The Reduce function properly merges

the intermediate pairs which have the same key. Given such

a functional speciﬁcation, the MapReduce runtime automat-

ically parallelizes the computation by running multiple map

and/or reduce tasks in parallel over disjoined portions of

the input or intermediate data. Google’s MapReduce im-

plementation facilitates processing of terabytes on clusters

with thousands of nodes. The Phoenix implementation is

based on the same principles but targets shared-memory

systems such as multi-core chips and symmetric multipro-

cessors.

Phoenix uses threads to spawn parallel Map or Reduce

tasks. It also uses shared-memory buffers to facilitate com-

munication without excessive data copying. The runtime

schedules tasks dynamically across the available processors

in order to achieve load balance and maximize task through-

put. Locality is managed by adjusting the granularity and

assignment of parallel tasks. The runtime automatically re-

covers from transient and permanent faults during task exe-

cution by repeating or re-assigning tasks and properly merg-

ing their output with that from the rest of the computation.

Overall, the Phoenix runtime handles the complicated con-

currency, locality, and fault-tolerance tradeoffs that make

parallel programming difﬁcult. Nevertheless, it also allows

the programmer to provide application speciﬁc knowledge

such as custom data partitioning functions (if desired).

We evaluate Phoenix on commercial multi-core and mul-

tiprocessor systems and demonstrate that it leads to scal-

able performance in both environments. Through fault in-

jection experiments, we show that Phoenix can handle per-

manent and transient faults during Map and Reduce tasks

at a small performance penalty. Finally, we compare the

performance of Phoenix code to tuned parallel code written

directly with P-threads. Despite the overheads associated

with the MapReduce model, Phoenix provides similar per-

formance for many applications. Nevertheless, the stylized

key management and additional data copying in MapRe-

duce lead to signiﬁcant performance losses for some ap-

plications. Overall, even though MapReduce may not be

applicable to all algorithms, it can be a valuable tool for

simple parallel programming and resource management on

shared-memory systems.

The rest of the paper is organized as follows. Section

2 provides an overview of MapReduce, while Section 3

presents our shared-memory implementation. Section 4 de-

scribes our evaluation methodology and Section 5 presents

the evaluation results. Section 6 reviews related work and

Section 7 concludes the paper.

2 MapReduce Overview

This section summarizes the basic principles of the

MapReduce model.

2.1 Programming Model

The MapReduce programming model is inspired by func-

tional languages and targets data-intensive computations.

The input data format is application-speciﬁc, and is spec-

iﬁed by the user. The output is a set of <key,value>

pairs. The user expresses an algorithm using two functions,

Map and Reduce. The Map function is applied on the in-

put data and produces a list of intermediate <key,value>

pairs. The Reduce function is applied to all intermediate

pairs with the same key. It typically performs some kind of

merging operation and produces zero or more output pairs.

Finally, the output pairs are sorted by their key value. In

the simplest form of MapReduce programs, the program-

mer provides just the Map function. All other functionality,

including the grouping of the intermediate pairs which have

the same key and the ﬁnal sorting, is provided by the run-

time.

The following pseudocode shows the basic structure of a

MapReduce program that counts the number of occurences

of each word in a collection of documents [8]. The map

function emits each word in the documents with the tempo-

rary count 1. The reduce function sums the counts for each

unique word.

// input: a document

// intermediate output: key=word; value=1

Map(void

input) {

for each word w in input

EmitIntermediate(w, 1);

}

// intermediate output: key=word; value=1

// output: key=word; value=occurences

Reduce(String key, Iterator values) {

int result = 0;

for each v in values

result += v;

Emit(w, result);

}

The main beneﬁt of this model is simplicity. The pro-

grammer provides a simple description of the algorithm that

focuses on functionality and not on parallelization. The ac-

tual parallelization and the details of concurrency manage-

ment are left to the runtime system. Hence the program

code is generic and easily portable across systems. Nev-

ertheless, the model provides sufﬁcient high-level informa-

tion for parallelization. The Map function can be executed

in parallel on non-overlapping portions of the input data and

the Reduce function can be executed in parallel on each set

of intermediate pairs with the same key. Similarly, since

it is explicitly known which pairs each function will oper-

ate upon, one can employ prefetching or other scheduling

optimizations for locality.

The critical question is how widely applicable is the

MapReduce model. Dean and Ghemawat provided several

examples of data-intensive problems that were successfully

coded with MapReduce, including a production indexing

system, distributed grep, web-link graph construction, and

statistical machine translation [8]. A recent study by Intel

has also concluded that many data-intensive computations

can be expressed as sums over data points [9]. Such compu-

tations should be a good match for the MapReduce model.

Nevertheless, an extensive evaluation of the applicability

and ease-of-use of the MapReduce model is beyond the

scope of this work. Our goal is to provide an efﬁcient im-

plementation on shared-memory systems that demonstrates

its feasibility and enables programmers to experiment with

this programming approach.

2.2 Runtime System

The MapReduce runtime is responsible for paralleliza-

tion and concurrency control. To parallelize the Map func-

tion, it splits the input pairs into units that are processed

concurrently on multiple nodes. Next, the runtime parti-

tions the intermediate pairs using a scheme that keeps pairs

with the same key in the same unit. The partitions are

processed in parallel by Reduce tasks running on multi-

ple nodes. In both steps, the runtime must decide on fac-

tors such as the size of the units, the number of nodes in-

volved, how units are assigned to nodes dynamically, and

how buffer space is allocated. The decisions can be fully

automatic or guided by the programmer given application

speciﬁc knowledge (e.g., number of pairs produced by each

function or the distribution of keys). These decisions allow

the runtime to execute a program efﬁciently across a wide

range of machines and dataset scenarios without modiﬁca-

tions to the source code. Finally, the runtime must merge

and sort the output pairs from all Reduce tasks.

The runtime can perform several optimizations. It can re-

duce function-call overheads by increasing the granularity

of Map or Reduce tasks. It can also reduce load imbal-

ance by adjusting task granularity or the number of nodes

used. The runtime can also optimize locality in several

ways. First, each node can prefetch pairs for its current

Map or Reduce tasks using hardware or software schemes.

A node can also prefetch the input for its next Map or Re-

duce task while processing the current one, which is simi-

lar to the double-buffering schemes used in streaming mod-

els [23]. Bandwidth and cache space can be preserved using

hardware compression of intermediate pairs which tend to

have high redundancy [10].

The runtime can also assist with fault tolerance. When it

detects that a node has failed, it can re-assign the Map or

Reduce task it was processing at the time to another node.

To avoid interference, the replicated task will use separate

output buffers. If a portion of the memory is corrupted, the

runtime can re-execute just the necessary Map or Reduce

tasks that will re-produce the lost data. It is also possible to

produce a meaningful partial or approximated output even

when some input or intermediate data is permanently lost.

Moreover, the runtime can dynamically adjust the number

of nodes it uses to deal with failures or power and tempera-

ture related issues.

Google’s runtime implementation targets large clusters of

Linux PCs connected through Ethernet switches [3]. Tasks

are forked using remote procedure calls. Buffering and

communication occurs by reading and writing ﬁles on a dis-

tributed ﬁle system [12]. The locality optimizations focus

mostly on avoiding remote ﬁle accesses. While such a sys-

tem is effective with distributed computing [8], it leads to

very high overheads if used with shared-memory systems

that facilitate communication through memory and are typ-

ically of much smaller scale.

The critical question for the runtime is how signiﬁcant

are the overheads it introduces. The MapReduce model re-

quires that data is associated with keys and that pairs are

handled in a speciﬁc manner at each execution step. Hence,

there can be non-trivial overheads due to key management,

data copying, data sorting, or memory allocation between

execution steps. While programmers may be willing to sac-

riﬁce some of the parallel efﬁciency in return for a simple

programming model, we must show that the overheads are

not overwhelming.

3 The Phoenix System

Phoenix implements MapReduce for shared-memory

systems. Its goal is to support efﬁcient execution on mul-

tiple cores without burdening the programmer with concur-

rency management. Phoenix consists of a simple API that

is visible to application programmers and an efﬁcient run-

time that handles parallelization, resource management, and

fault recovery.

3.1 The Phoenix API

The current Phoenix implementation provides an

application-programmer interface (API) for C and C++.

However, similar APIs can be deﬁned for languages like

Java or C#. The API includes two sets of functions sum-

marized in Table 1. The ﬁrst set is provided by Phoenix

and is used by the programmer’s application code to ini-

tialize the system and emit output pairs (1 required and

2 optional functions). The second set includes the func-

tions that the programmer deﬁnes (3 required and 2 optional

functions). Apart from the Map and Reduce functions, the

user provides functions that partition the data before each

step and a function that implements key comparison. Note

that the API is quite small compared to other models. The

API is type agnostic. The function arguments are declared

as void pointers wherever possible to provide ﬂexibility in

their declaration and fast use without conversion overhead.

In constrast, the Google implementation uses strings for ar-

guments as string manipulation is inexpensive compared to

remote procedure calls and ﬁle accesses.

The data structure used to communicate basic function

information and buffer allocation between the user code and

runtime is of type scheduler args t. Its ﬁelds are sum-

marized in Table 2. The basic ﬁelds provide pointers to in-

put/output data buffers and to the user-provided functions.

They must be properly set by the programmer before call-

ing phoenix scheduler(). The remaining ﬁelds are

optionally used by the programmer to control scheduling

decisions by the runtime. We discuss these decisions further

in Section 3.2.4. There are additional data structure types to

facilitate communication between the Splitter, Map, Parti-

tion, and Reduce functions. These types use pointers when-

ever possible to implement communication without actually

copying signiﬁcant amounts of data.

The API guarantees that within a partition of the interme-

diate output, the pairs will be processed in key order. This

makes it easier to produce a sorted ﬁnal output which is of-

ten desired. There is no guarantee in the processing order of

the original input during the Map stage. These assumptions

did not cause any complications with the programs we ex-

amined. In general it is up to the programmer to verify that

the algorithm can be expressed with the Phoenix API given

these restrictions.

The Phoenix API does not rely on any speciﬁc com-

Function Description R/O

Functions Provided by Runtime

int phoenix scheduler (scheduler args t

args) R

Initializes the runtime system. The scheduler args t struct provides the needed function & data pointers

void emit intermediate(void

key, void

val, int key size) O

Used in Map to emit an intermediate output <key,value> pair. Required if the Reduce is deﬁned

void emit(void

key, void

val) O

Used in Reduce to emit a ﬁnal output pair

Functions Deﬁned by User

int (

splitter t)(void

, int, map args t

) R

Splits the input data across Map tasks. The arguments are the input data pointer, the unit size for each task, and the

input buffer pointer for each Map task

void (

map t)(map args t

) R

The Map function. Each Map task executes this function on its input

int (

partition t)(int, void

, int) O

Partitions intermediate pair for Reduce tasks based on their keys. The arguments are the number of Reduce tasks, a

pointer to the keys, and a the size of the key. Phoenix provides a default partitioning function based on key hashing

void (

reduce t)(void

, void

, int) O

The Reduce function. Each reduce task executes this on its input. The arguments are a pointer to a key, a pointer to the

associated values, and value count. If not speciﬁed, Phoenix uses a default identity function

int (

key cmp t)(const void

, const void

) R

Function that compares two keys

Table 1. The functions in the Phoenix API. R and O identify required and optional fuctions respectively.

piler options and does not require a parallelizing com-

piler. However, it assumes that its functions can freely

use stack-allocated and heap-allocated stuctures for pri-

vate data. It also assumes that there is no communica-

tion through shared-memory structures other than the in-

put/output buffers for these functions. For C/C++, we can-

not check these assumptions statically for arbitrary pro-

grams. Although there are stringent checks within the sys-

tem to ensure valid data are communicated between user

and runtime code, eventually we trust the user to provide

functionally correct code. For Java and C#, static checks

that validate these assumptions are possible.

3.2 The Phoenix Runtime

The Phoenix runtime was developed on top of P-

threads [18], but can be easily ported to other shared-

memory thread packages.

3.2.1 Basic Operation and Control Flow

Figure 1 shows the basic data ﬂow for the runtime system.

The runtime is controlled by the scheduler, which is initi-

ated by user code. The scheduler creates and manages the

threads that run all Map and Reduce tasks. It also manages

the buffers used for task communication. The programmer

provides the scheduler with all the required data and func-

tion pointers through the scheduler args t structure.

After initialization, the scheduler determines the number of

cores to use for this computation. For each core, it spawns

a worker thread that is dynamically assigned some number

of Map and Reduce tasks.

To start the Map stage, the scheduler uses the Splitter

to divide input pairs into equally sized units to be processed

by the Map tasks. The Splitter is called once per Map

task and returns a pointer to the data the Map task will pro-

cess. The Map tasks are allocated dynamically to work-

ers and each one emits intermediate <key,value> pairs.

The Partition function splits the intermediate pairs into

units for the Reduce tasks. The function ensures all values

of the same key go to the same unit. Within each buffer,

values are ordered by key to assist with the ﬁnal sorting. At

this point, the Map stage is over. The scheduler must wait

for all Map tasks to complete before initiating the Reduce

stage.

Reduce tasks are also assigned to workers dynamically,

similar to Map tasks. The one difference is that, while with

Map tasks we have complete freedom in distributing pairs

across tasks, with Reduce we must process all values for the

same key in one task. Hence, the Reduce stage may exhibit

higher imbalance across workers and dynamic scheduling is

more important. The output of each Reduce task is already

sorted by key. As the last step, the ﬁnal output from all tasks

is merged into a single buffer, sorted by keys. The merging

takes place in log

(P/2) steps, where P is the number of

workers used. While one can imagine cases where the out-

put pairs do not have to be ordered, our current implemen-

tation always sorts the ﬁnal output as it is also the case in

Google’s implementation [8].

Field Description

Basic Fields

Input data Input data pointer; passed to the Splitter by the runtime

Data size Input dataset size

Output data Output data pointer; buffer space allocated by user

Splitter Pointer to Splitter function

Map Pointer to Map function

Reduce Pointer to Reduce function

Partition Pointer to Partition function

Key cmp Pointer to key compare function

Optional Fields for Performance Tuning

Unit size Pairs processed per Map/Reduce task

L1 cache size L1 data cache size in bytes

Num Map workers Maximum number of threads (workers) for Map tasks

Num Reduce workers Maximum number of threads (workers) for Reduce tasks

Num Merge workers Maximum number of threads (workers) for Merge tasks

Num procs Maximum number of processors cores used

Table 2. The scheduler args t data structure type.

3.2.2 Buffer Management

Two types of temporary buffers are necessary to store data

between the various stages. All buffers are allocated in

shared memory but are accessed in a well speciﬁed way by

a few functions. Whenever we have to re-arrange buffers

(e.g., split across tasks), we manipulate pointers instead of

the actual pairs, which may be large in size. The intermedi-

ate buffers are not directly visible to user code.

Map-Reduce buffers are used to store the intermediate

output pairs. Each worker has its own set of buffers. The

buffers are initially sized to a default value and then resized

dynamically as needed. At this stage, there may be multiple

pairs with the same key. To accelerate the Partition

function, the Emit intermediate function stores all

values for the same key in the same buffer. At the end of

the Map task, we sort each buffer by key order. Reduce-

Merge buffers are used to store the outputs of Reduce tasks

before they are sorted. At this stage, each key has only one

value associated with it. After sorting, the ﬁnal output is

available in the user allocated Output data buffer.

3.2.3 Fault Recovery

The runtime provides support for fault tolerance for tran-

sient and permanent faults during Map and Reduce tasks. It

focuses mostly on recovery with some limited support for

fault detection.

Phoenix detects faults through timeouts. If a worker does

not complete a task within a reasonable amount of time,

then a failure is assumed. The execution time of similar

tasks on other workers is used as a yardstick for the timeout

interval. Of course, a fault may cause a task to complete

with incorrect or incomplete data instead of failing com-

pletely. Phoenix has no way of detecting this case on its own

and cannot stop an affected task from potentially corrupt-

ing the shared memory. To address this shortcoming, one

should combine the Phoenix runtime with known error de-

tection techniques [20, 21, 24]. Due to the functional nature

of the MapReduce model, Phoenix can actually provide in-

formation that simpliﬁes error detection. For example, since

the address ranges for input and output buffers are known,

Phoenix can notify the hardware about which load/store ad-

dresses to shared structures should be considered safe for

each worker and which should signal a potential fault.

Once a fault is detected or at least suspected, the runtime

attempts to re-execute the failed task. Since the original

task may still be running, separate output buffers are allo-

cated for the new task to avoid conﬂicts and data corruption.

When one of the two tasks completes successfully, the run-

time considers the task completed and merges its result with

the rest of the output data for this stage. The scheduler ini-

tially assumes that the fault was a transient one and assigns

the replicated task to the same worker. If the task fails a

few times or a worker exhibits a high frequency of failed

tasks overall, the scheduler assumes a permanent fault and

no further tasks are assigned to this worker.

The current Phoenix code does not provide fault recovery

for the scheduler itself. The scheduler runs only for a very

small fraction of the time and has a small memory footprint,

hence it is less likely to be affected by a transient error. On

the other hand, a fault in the scheduler has more serious im-

plications for the program correctness. We can use known

techniques such as redundant execution or checkpointing to

address this shortcoming.

Google’s MapReduce system uses a different approach

Evaluating MapReduce for Multi-core and Multiprocessor Systems

Figures

Citations

MapReduce: simplified data processing on large clusters

Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

Data mining with big data

Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library

Twister: a runtime for iterative MapReduce

References

MapReduce: simplified data processing on large clusters

The Google file system

X10: an object-oriented approach to non-uniform cluster computing

The implementation of the Cilk-5 multithreaded language

Parallel Prefix Computation

Related Papers (5)

MapReduce: simplified data processing on large clusters

The Google file system

Dryad: distributed data-parallel programs from sequential building blocks

Map-Reduce for Machine Learning on Multicore

Improving MapReduce performance in heterogeneous environments

Frequently Asked Questions (19)

Q1. What are the contributions in "Evaluating mapreduce for multi-core and multiprocessor systems" ?

Q2. How can the runtime preserve the bandwidth and cache space?

Q3. What is the importance of the mapreduce model?

Q4. What is the function that can be used to prefetch input for a map?

Q5. What is the effect of the work on the heaps?

Q6. What is the function that sums the counts for each unique word?

Q7. What are the main factors that affect their acceptance?

Q8. How does the Phoenix implementation handle permanent and transient faults?

Q9. What is the way to test the mapreduce model?

Q10. How many cores do you currently spawn?

Q11. What is the important conclusion from Figure 6?

Q12. What is the way to schedule a system?

Q13. What is the way to scale the system load?

Q14. What is the function that can be used to trigger a prefetch engine?

Q15. What is the way to monitor the performance of a data intensive program?

Q16. What are some examples of data-intensive problems that were successfully coded with MapReduce?

Q17. What is the algorithm for calculating the sum of squares?

Q18. What is the key-based structure that MapReduce uses?

Q19. What is the algorithm for calculating the frequency of component occurences in a bitmap?