Testing data transformations in MapReduce programs
read more
Citations
Comparative Analysis of Techniques for Big-data Performance Testing
Comparative Analysis of Techniques for Big-data Performance Testing
TRANSMUT-SPARK: Transformation Mutation for Apache Spark-Long Version
A Method-Level Test Generation Framework for Debugging Big Data Applications
References
MapReduce: simplified data processing on large clusters
MapReduce: simplified data processing on large clusters
Selecting Software Test Data Using Data Flow Information
Software Testing Research: Achievements, Challenges, Dreams
The HiBench benchmark suite: Characterization of the MapReduce-based data analysis
Related Papers (5)
Frequently Asked Questions (16)
Q2. What have the authors stated for future works in "Testing data transformations in mapreduce programs" ?
As future work the authors plan to apply MRFlow in more programs and to automate the technique in areas such as test coverage items, the execution of test cases, the derivation of test cases or the graph on which these test cases are derived.
Q3. What is the function that solves the problem?
The Map function receives the data input and emits a <key, value> pair, then the Reduce function receives <key, list(values)> pairs that contain all the information about each subproblem, and finally solves it with a <key, value> pairs.
Q4. What is the purpose of the transformations in MRFlow?
The program has 5 tp obtained from the transformation between values and sum (tp1), the non-existence of values transformations (tp2, tp3 and tp4) and the non-existence of key transformations (tp5).
Q5. What is the purpose of MRFlow graph?
In MRFlow graph, the paths under test start in definition of key/value and finish in each possible last transformation of such variables.
Q6. What is the purpose of the MapReduce program?
The MapReduce programs are often used in Big Data programs [28], which process large data (Volume), with a necessary performance (Velocity) and with different types of data, data from different sources, and data without apparently a data model such as for example emails or videos (Variety).
Q7. What is the purpose of the MRFlow graph?
In the MRFlow graph of both programs, the <key, list(value)> input variables has one definition and the program contains 3 transformations: transformation of values into another variable, no values transformation and no key transformation.
Q8. What is the output of the program that receives a word?
If WordCount program receives "hello, hello, hello", the expected output is hello:3, but the real output is hello:1, hello,:2 because the Reduce function receives an invalid key "hello," that is not a word.
Q9. What is the reason why MRFlow fails?
The faults are caused by the non-validation of key, but MRFlow in other programs could detect other defects relative to the transformations of keys and values.
Q10. What is the importance of MapReduce programs?
The quality in MapReduce programs is important due to their use in critical sectors, like health (ADN alignment [27]) or security (image processing in ballistics [17]).
Q11. What is the definition of the values variable?
The values variable is defined in node 0 and the last transformations are sum and values depending on whether statement 3 is reached or not.
Q12. What connector indicates that a transformation exists with both elements of the sequence?
The conjunction connector indicates that a transformation exists with both elements of the sequence, and the disjunction connector indicates that several transformations exist, one for each part of the sequence.
Q13. What is the purpose of the transformation paths?
The paths under test, called transformation paths (tp), are extracted from transformations between input and output in MRFlow graph.
Q14. What type of tests can be performed in MRFlow?
In order to explore the applicability of the testing technique, MRFlow is applied over two popular programs: WordCount [3] which counts the occurrences of each word in a text, and IPCountry [24] which counts the number of IPs (Internet Protocol addresses) in each country.
Q15. What is the purpose of this paper?
In order to detect these defects, this paper proposes a testing technique that analyzes the program transformations which could produce the failures.
Q16. What are the main features of MRFlow?
As future work the authors plan to apply MRFlow in more programs and to automate the technique in areas such as test coverage items, the execution of test cases, the derivation of test cases or the graph on which these test cases are derived.