Evaluating MapReduce for Multi-core and Multiprocessor Systems
read more
Citations
MapReduce: simplified data processing on large clusters
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data
Data mining with big data
Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library
Twister: a runtime for iterative MapReduce
References
MapReduce: simplified data processing on large clusters
The Google file system
X10: an object-oriented approach to non-uniform cluster computing
The implementation of the Cilk-5 multithreaded language
Parallel Prefix Computation
Related Papers (5)
Frequently Asked Questions (19)
Q2. How can the runtime preserve the bandwidth and cache space?
Bandwidth and cache space can be preserved using hardware compression of intermediate pairs which tend to have high redundancy [10].
Q3. What is the importance of the mapreduce model?
The MapReduce model requires that data is associated with keys and that pairs are handled in a specific manner at each execution step.
Q4. What is the function that can be used to prefetch input for a map?
A node can also prefetch the input for its next Map or Reduce task while processing the current one, which is similar to the double-buffering schemes used in streaming models [23].
Q5. What is the effect of the work on the heaps?
As work is distributed across more cores, the heaps accessed by each core are smaller and operations on them become significantly faster.
Q6. What is the function that sums the counts for each unique word?
The reduce function sums the counts for each unique word.// input: a document // intermediate output: key=word; value=1Map(void *input) { for each word w in inputEmitIntermediate(w, 1); }// intermediate output: key=word; value=1 // output: key=word; value=occurencesReduce(String key, Iterator values) { int result = 0; for each v in values result += v; Emit(w, result);
Q7. What are the main factors that affect their acceptance?
Apart from ease-of-use and scalability, two factors that may affect their acceptance is how well they run on existing hardware and if they can tolerate errors.
Q8. How does the Phoenix implementation handle permanent and transient faults?
Through fault injection experiments, the authors show that Phoenix can handle permanent and transient faults during Map and Reduce tasks at a small performance penalty.
Q9. What is the way to test the mapreduce model?
Certain applications, such as WordCount and ReverseIndex, fit well with the MapReduce model and lead to very compact and simple Phoenix code.
Q10. How many cores do you currently spawn?
Number of Cores and Workers/Core: Since MapReduce programs are data-intensive, the authors currently spawn workers to all available cores.
Q11. What is the important conclusion from Figure 6?
The conclusion from Figure 6 is that, given an efficient implementation, MapReduce is an attractive model for some classes of computation.
Q12. What is the way to schedule a system?
In general, there are three scheduling approaches one can employ: 1) use a default policy for the specific system which has been developed taking into account its characteristics; 2) dynamically determine the best policy for each decision by monitoring resource availability and runtime behavior; 3) allow the programmer to provide application specific policies.
Q13. What is the way to scale the system load?
In a multi-programming environment, the scheduler can periodically check the system load and scale its usage based on system-wide priorities.
Q14. What is the function that can be used to trigger a prefetch engine?
The runtime can trigger a prefetch engine that brings the data for the next task to the L2 cache in parallel with processing the current task.
Q15. What is the way to monitor the performance of a data intensive program?
At the beginning of a data intensive program, the runtime can vary the unit size and monitor the trends in the completion time or other performance indicators (processor utilization, number of misses, etc.) in order to select the best possible value.
Q16. What are some examples of data-intensive problems that were successfully coded with MapReduce?
Dean and Ghemawat provided several examples of data-intensive problems that were successfully coded with MapReduce, including a production indexing system, distributed grep, web-link graph construction, and statistical machine translation [8].
Q17. What is the algorithm for calculating the sum of squares?
The algorithm assigns different portions of the file to different map tasks, which compute certain summary statistics like the sum of squares.
Q18. What is the key-based structure that MapReduce uses?
The key-based structure that MapReduce uses fits well the algorithm of WordCount, MatrixMultiply, StringMatch, and LinearRegression.
Q19. What is the algorithm for calculating the frequency of component occurences in a bitmap?
The algorithm assigns different portions of the image to different Map tasks, which parse the image and insert the frequency of component occurences into arrays.