scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Testing MapReduce programs: A systematic mapping study

TL;DR: MapReduce is a processing model used in Big Data to facilitate the analysis of large data under a distributed architecture that simplifies the management of large amounts of data.
Abstract: Trabajo apoyado en parte por los proyectos TIN2016‐76956 ‐ C3‐1 ‐ R y TIN2013‐46928 ‐ C3‐1, financiado por el Ministerio de Economia y Competitividad de Espana, y GRUPIN14‐007, financiado por el Principado de Asturias (Espana) y FEDER

Summary (6 min read)

1 INTRODUCTION

  • Big Data or Data-intensive programs are those that cannot run using the traditional technology/techniques 1 and usually need novel approaches.
  • There are not many studies related toMapReduce applications.
  • This interest in Big Data during the previous years could have evolved the state-of-the-art about software testing in the MapReduce programs.
  • In contrast to the aforementioned mapping study, this paper obtains more thorough results because of its deeper scope and different approach/motivation.
  • The research questions are proposed in Section 3 together with the systematic steps planned to answer them.

2 MAPREDUCE PROCESSING MODEL

  • TheMapReduce programs 2 divide one problem into several subproblems that are executed in parallel over a large number of computers.
  • To illustrate MapReduce, let us imagine a program that calculates the average temperature per year.
  • The Map function receives years with temperatures and creates the <key, value> pairs in order to group the temperatures per year.
  • The programs are executed by a framework that automatically manages the resource allocation, the re-execution of one part of the program in case of infrastructure failures, and the scheduling of all executions between other mechanisms.
  • Another problem is that new raw data are continuously generated and the data model could change over time, and then the program needs some changes.

3 PLANNING OF THE MAPPING STUDY

  • This mapping study aims to characterize the knowledge of software testing approaches in the MapReduce programs through an empirical study of the research literature.
  • To avoid bias, the planning of the mapping study describes several tasks based on Kitchenham et al. guidelines 22: 1. Formulation of the research questions (Subsection 3.1).
  • The search process to extract the significant literature (primary studies) to answer the research questions (Subsection 3.2).
  • Data synthesis to summarize, mix and put the data into context to answer the questions (Subsection 3.4).
  • These tasks are planned and then conducted independently as described in Figure 2.

3.1 Research Questions

  • The research questions are formulated to cover all the information about software testing research in the context of theMapReduce programs with different points of view.
  • This work formulates the research questions based on the 5W+1H model 37,38, also known as the Kipling method 39.
  • This method is used in other software engineering empirical studies 40,41 and answers the questions:.
  • The research questions of this mapping study are: RQ1.

3.2 Search Process

  • The mapping study answers the research questions based on a series of studies that contain relevant information about these questions.
  • These studies are called primary studies and are obtained through the tasks described in Figure 3.
  • First, the search terms (set of several words/terms) related to software testing andMapReduce are searched for in different data sources (journals, conferences and electronic databases).
  • The papers that match these searches together with other studies recommended by experts constitute the potential primary studies.
  • Finally, these studies are filtered in the study selection in order to obtain only the studies that contain information to answer the research questions.

3.2.1 Search Terms

  • The search terms are obtained from the three points of view proposed by Kitchenham et al. 22: (1) population that refers to the technologies and areas related toMapReduce, (2) intervention that are the issues related to software testing, and (3) outcomes that are the improvements obtained through software testing.
  • Hadoop is a distributed system that supports the execution of MapReduce programs and non-MapReduce programs, but there are several papers that use Hadoop andMapReducewords interchangeably in the title.
  • In order to obtain the maximum relevant literature and avoid missing some primary studies due to the buzzwords and jargon, a thorough search is performed considering the MapReduce and Big Data related technologies enumerated in Table 1.
  • This work performs a wide search with 9384 combinations of terms in the paper title, obtained by 92MapReduce technology related terms and 102 quality related terms.

3.2.2 Data Sources

  • The potential primary studies may be in different data sources.
  • This category contains 624 proceedings/volumes from the year of theMapReduce paper (2004) to June 2016.
  • The opinion of authors with experience in software testing and MapReduce, together with the other related mapping study 25 could provide potential primary studies.
  • To avoid this problem, the authors created a program that splits the 9384 combinations of search terms in 2346 searches and simulates a human performing these requests.

3.2.3 Study Selection

  • Some potential primary studies obtained from the data sources could not contain information about software testing in theMapReduce programs.
  • The filters consist in the next criteria applied in the following order: C1) Filter by year.
  • The potential primary studies are excluded when they do not contain Big Data information.
  • Some other papers employ theMapReduce and Big Data capabilities to speed up testing in other non-MapReduce programs.
  • This mapping study performs a wide search with more than 70000 research papers found before the filter C1.

3.3 Data Extraction

  • The relevant information of the primary studies is extracted through a template divided in two parts.
  • The first checklist contains the following roles: Manager, Analyst, Architect, Tester, Test manager, Test strategist, Other stakeholders, Unclear andNot applicable.
  • These data are extracted in a checklist with the following information about the research validation of the studies: a).

3.4 Data Synthesis

  • The data extracted from the primary studies are synthesized in order to answer the research questions.
  • In empirical software engineering there are several synthesis methods 62 based on different approaches according to the type of data or research questions, among others.
  • And (2) meta-ethnography 64 for the remaining research questions.
  • Create a group of labels for each previous segment/phrase based on the type of reason for testing.
  • Once the data are extracted from the primary studies in these checklists, the research questions are answered by a frequency analysis.

3.5 Limitations of the Systematic Mapping Study

  • Another less important potential bias could occur during the search process if some primary studies are not found without search terms or expert opinions.
  • This study searches 5 electronic databases and 2311 proceedings/volumes related to software testing in theMapReduce programs, also known as Data sources.
  • The mapping study excludes the non-relevant studies based on 4 filters, also known as Study selection.
  • The majority of the data extracted are based on checklists, in some cases obtained from international standards and in others created or adapted to theMapReduce processing model, also known as 3. Data extraction.

4 RESULTS

  • The results are obtained through the conducting of the systematic mapping study that answers the research questions based on the planning of Section 3.
  • From them, the data are extracted, and the synthesis is developed in Subsection 4.2.
  • Finally, the general results are discussed in Subsection 4.3.

4.1 Primary Studies

  • In this work there are 54 primary studies derived from more than 70000 potential studies obtained though the search process detailed in Figure 4.
  • TheMapReduce processing model was described in 2004, but the software testing efforts in this field according to the primary studies started in 2010 with only 1 study and after six years and six months the number of primary studies has increased to 54.
  • The different types of validations employed in the research are summarized in Table 4.
  • Testing in Big Data has open up new challenges 67, especially in the understanding of the data and its complex structures 68.
  • In the case of high Volume, it could be difficult to check whether the test case output is the expected, and the use of automatic tools can be helpful 69.

4.1.1 Performance testing and analysis

  • These prediction models characterize the performance based on different kinds of input parameters.
  • The prediction models can have different goals beyond the execution time, for example the Yang et al. model 74 helps to obtain the values of the input parameters that achieve the best execution time.
  • While some models predict the performance analyzing the execution time of several samples 78 or considering the previous executions 79, other models consider some specific characteristics of theMapReduce execution.
  • The Vianna et al. model 80 considers the influence over the performance of the MapReduce tasks that are executed in parallel.
  • The tester can also monitor the execution of theMapReduce programs and test cases, obtaining charts to evaluate the performance and potential bottlenecks 102.

4.1.2 Functional testing

  • Themisconfiguration is one of themost common problems that lead to amemory/performance issues inMapReduce 104.
  • Empirical study 105, the users rarely tune the configuration parameters that are related to performance.
  • Another technique to detect these faults caused by non-determinism checks dynamically the properties of the program under test with random data 113.
  • There are several studies that proposed to inject infrastructure failures in the test case design 114.
  • Another technique to generate the data of the test cases, employs a bacteriological algorithm aimed to kill some semantic mutants specific for MapReduce: varies both the number of the Reducers and the existence or not of the Combiner functionality 117.

4.2 Synthesis

  • The primary studies contain the answers to the research questions, but this information is hidden inside.
  • The synthesis obtains valuable information in order to answer the research questions based on the data extracted from the primary studies.
  • The data are extracted following the template defined in Subsection 3.3 and then synthesized by the methods described in Subsection 3.4.
  • In the following subsections the primary studies are analyzed, classified and summarized in order to obtain the answer to each research question systematically.

4.2.1 RQ1Why is testing performed in the MapReduce programs?

  • The MapReduce programs are tested for several reasons.
  • The specific faults of theMapReduce programs and the number of the programs that fail in production, also known as Failure related.
  • In Table 5 each reason for testing is also classified based on the degree of formality of the evidence in accordance with the following types: reasons with formal evidence and with informal evidence.
  • Influence of the infrastructure in application performance: whereas the MapReduce applications can be designed without consider the infrastructure, the program performance is influenced by the production infrastructure.
  • From the 41 “performance related” reasons for testing, the most frequent are focused in the analysis (36.83% of “performance related” reasons) and optimization of the performance (26.83% of “performance related” reasons), followed by the fulfillment of the performance goals (24.39% of “performance related” reasons).

4.2.2 RQ2What testing is performed in the MapReduce programs?

  • The planning of Subsection 3.4 proposes a meta-ethnography 64 to answer this research question.
  • The data extracted from each primary study has two facets in order to answer RQ2: a. Quality (sub)characteristics for each study according to the ISO/IEC 25010:2011 42 represented in Table 7. b. Quality-Related Types of Testing proposed in each study based on ISO/IEC/IEEE 29119-4:2015 58 and summarized in Table 8.
  • The majority of efforts are focused on “performance efficiency” with 64.81% of the studies, then on “functional suitability” with 25.93% of the studies, and finally on “reliability” with 5.56% of the studies.
  • Regarding the type of testing, 59.26% apply “performance-related testing”, 22.22% employ “functional testing” and 3.7% use “backup/recovery testing”.
  • The results obtained through the combination of both facets are more or less those expected: the “performance-related testing” is used to “performance efficiency” characteristics, the “functional testing” to “functional suitability”, and “backup/recovery testing” to “reliability”.

4.2.3 RQ3 How is testing performed in the MapReduce programs?

  • This research question is answered through the meta-ethnography 64 proposed in Subsection 3.4.
  • In order to answer RQ3, the primary studies are analyzed considering three facets: a.
  • Testing methods/techniques are summarized in Table 9 according to the test activities proposed in Annex A of ISO/IEC/IEEE 29119-1:2013 9. b.
  • Dependency between the primary studies and theMapReduce processing model is depicted in Table 10.
  • C. Tools created or used in the primary studies to perform software testing are characterized in Table 11.
  • Other testing activities are used to a lesser degree, such as for example “structure based” in 7.41% of the studies or static analysis in 5.56% of the studies.

4.2.4 RQ4Who, where and when is testing performed in the MapReduce programs?

  • The planning of themapping study described in Section 3.4 proposes a meta-ethnography 64 to answer the research question through three facets: a.
  • The different roles that participate in the testing efforts of theMapReduce programs, described in Table 12.
  • To a lesser extent the testing efforts are oriented towards the parts of the program that could not containMapReduce functions: 7.41% of the studies consider the integration testing between the MapReduce functions with other parts of the program, and 3.7% of the studies for testing the system.
  • From these results, it appears that the fulfillment of the contract or user requirements tested in the acceptance testing level is not greatly affected by the existence of MapReduce functions in the system.
  • Regardless of the test level, the testing described in the primary studies is mainly performed in the Software/System Qualification Testing Process.

4.3 Discussion of Results

  • The research questions of Subsection 3.1 are answered through the primary studies, data extraction and data synthesis.
  • The most frequent reasons are based on performance issues (analyze, optimize and fulfil performance goals), existence of several and specific failures, the type and quality of the data processed by these programs, and testing to predict and select efficiently the resources.
  • According to Table 8, the studies related to functionality only represent 22.22% even though 42.17% of the reasons for testing are related to functionality .
  • This classification reflects the research efforts to boost the Big Data Engineering field because 44.1% of the studies improve the technology, 18.31% analyse the technology through studies and surveys, 9.01% create new technologies to manage and analyse data, and 6.62% are focused on the state-of-the-art and challenges.

5 CONCLUSIONS

  • The number of studies on software testing in theMapReduce programs has increased during recent years.
  • A characterization is carried out based on 54 research studies obtained from more than 70000 potential papers.
  • These reasons for testing assume that both functional and performance testing are necessary, but the studies employ different approaches: functional testing considers different aspects of the program (such as specification and structure) while performance testing is more focused on simulation and evaluation.
  • Regardless of the type of testing, the majority of efforts are specific for the MapReduce technology at unit and integration level of theMap and Reduce functions.
  • There is room to mature with better validations and thus improve the research impact.

Did you find this useful? Give us your feedback

Figures (21)

Content maybe subject to copyright    Report

Received d Month yyyy; Revised d Month yyyy; Accepted d Month yyyy
DOI: xxx/xxxx
PERSPECTIVE ARTICLE
Testing MapReduce Programs: A Systematic Mapping Study
Jesús Morán* | Claudio de la Riva | Javier Tuya
Department of computing, Univeristy of
Oviedo, Asturias, Spain
Correspondence
*Jesús Morán, Department of computing,
University of Oviedo, Gijón, Spain. Email:
moranjesus@uniovi.es
Present Address
Campus de Viesques, 33394, Gijón, Spain
Summary
MapReduce is a processing model used in Big Data to facilitate the analysis of large data under
a distributed architecture with scale and fault tolerance mechanisms. These programs are con-
sidered critical for several enterprises causing high revenues. In order to guarantee their quality,
researchers have proposed several software testing techniques and tools. This paper character-
izes their state-of-the-art identifying the trends and gaps through a mapping study. The research
literature of this topic is analyzed and synthetized systematically, finding that the main testing
efforts are carried out by the tester in order to test the performance and, to a lesser degree, the
program functionality. The principal reasons for testing the programs are performance issues,
potential failures, issues related to the data, or to satisfy the agreements with efficient resources.
The performance testing is carried out through simulation and evaluation, whereas the functional
testing considers some program characteristics (such as specification and structure). Despite the
fact that functionality is relevant to satisfy the business requirements, few studies are focused
on functional testing and can indicate a potential research challenge. In addition, there is room to
improve the software testing research in the MapReduce applications through more mature and
standard validation methods.
KEYWORDS:
Software testing, Systematic mapping study, MapReduce, Big Data Engineering
1 INTRODUCTION
Big Data or Data-intensive programs are those that cannot run using the traditional technology/techniques
1
and usually need novel approaches.
MapReduce is one of the most important processing models used in Big Data based on the “divide and conquer” principle
2
. These programs run
two functions in a distributed infrastructure, the Map function splits one problem into several subproblems (divide) and the Reduce function solves
each subproblem (conquer). There are several technologies to execute and manage MapReduce programs such as Spark
3
, Flink
4
and Hadoop
5
, all
broadly implemented in industry
6
. It is necessary to ensure the quality of these programs, especially those employed in critical sectors like health or
security, such as DNA alignment with MapReduce
7
and image processing in ballistics with MapReduce
8
. These new approaches to process large data
in general, and MapReduce in particular, have several characteristics that could have an impact on program quality, for example: (1) analysis of large
quantities of data, (2) variety of the input information, (3) data without an apparent data model (schema-less), (4) program optimizations to obtain
better performance, (5) implementation of the data models in each program (schema-on-read), (6) execution over heterogeneous infrastructure,
and (7) automatic mechanisms to manage the resources (for example, scaling and fault tolerance).
There are several approaches to improve the quality, and software testing is one of the most used. According to the ISO/IEC/IEEE 29119-
1:2013 standard
9
software testing aims to provide information of the program quality and the potential impacts/risks of poor quality. Software
testing research has evolved in recent years
10
, but there are several challenges to test programs in cloud and adaptive architectures
11
.
The adoption and interest in these technologies/paradigms has increased over the last few years to the extent that several Fortune 1000 enter-
prises consider Big Data critical for business
12
. Despite the importance of these applications, some studies forecast that 60% of Big Data projects

2 Morán et al
FIGURE 1 Example of the MapReduce program that calculates the average temperature per year.
fail to go beyond piloting and will be abandoned during 2017
13
. There are several challenges and concerns: poor data quality
14,15
, lack of techno-
logical skills
16,17
, and among others, different technological issues such as complexity
18
, maturity
19
, operability
20
and technical problems
14
. Some
of the previously stated problems complicate the development and the MapReduce application could be implemented with faults. Although soft-
ware testing is one of the quality assurance techniques most used to evaluate software products, there are not many studies related to MapReduce
applications. The contribution of this paper is an evaluation and characterization about the state-of-the-art of software testing in the MapRe-
duce applications through a systematic mapping study
21,22,23
. In this type of studies, research questions are proposed and then answered based on
research studies.
A mapping study by Sharma et al.
24
on Big Data and Hadoop
5
indicates that the number of papers has increased significantly in recent years.
This interest in Big Data during the previous years could have evolved the state-of-the-art about software testing in the MapReduce programs.
Another mapping study was elaborated in 2013 by Camargo et al.
25
on software testing in MapReduce programs. Their study analyses only 14
papers and the results are focused on what types of faults the MapReduce programs have, how to perform the tests, the tools and the testing
techniques used. In contrast to the aforementioned mapping study, this paper obtains more thorough results because of its deeper scope and
different approach/motivation. The main differences between this mapping study and that of Camargo et al.
25
are: (1) broader research questions
to analyze the software testing field in a more holistic way than ad-hoc or specific research questions, (2) broader and more general results obtained
through the research questions, (3) relevant literature obtained through a large search involving more sources, (4) almost quadruple the number of
papers analyzed in depth to improve the results, (5) deeper synthesis analysis of the papers based on several international standards in order to
obtain accurate results, and (6) inclusion of the recent research lines.
The paper continues as follows. Section 2 introduces MapReduce and describes the main challenges from the testing point of view. The research
questions are proposed in Section 3 together with the systematic steps planned to answer them. The answers and other results are detailed in
Section 4. Finally, Section 5 contains the conclusions.
2
MAPREDUCE PROCESSING MODEL
The MapReduce programs
2
divide one problem into several subproblems that are executed in parallel over a large number of computers. There are
two principal functions: (1) Map that analyses part of the input data and classifies them in subproblems, and (2) Reduce that solves each subproblem.
The data processed by these functions are in the form of <key, value> pairs in which the key is the identifier of each subproblem and the value
contains the information needed to solve it. To illustrate MapReduce, let us imagine a program that calculates the average temperature per year.
This problem could be divided into one subproblem per year, then the key (identifier of subproblem) is the year, and the value (information of the
subproblem) is the temperature. Figure 1 details a distributed execution of the program analyzing the years 2000-2003. Firstly, two computers
analyze the data, but one computer fails and this analysis is re-executed in a third computer. The Map function receives years with temperatures
and creates the <key, value> pairs in order to group the temperatures per year. Then the Reduce function receives from all Maps one year with all
of its temperatures, and calculates the average.
The programs are executed by a framework that automatically manages the resource allocation, the re-execution of one part of the program
in case of infrastructure failures, and the scheduling of all executions between other mechanisms. The data analyzed could be stored in several
distributed sources, such as for example non-relational databases and distributed file systems.
The integration of all of these technologies in the MapReduce program stack could be a challenge for developers and testers. Some technologies
do not scale well, do not support indexing, or do not support ACID transactions, among others. Another challenge is the implementation of the
data model in the program. MapReduce can analyze raw data without a data model (schema-less or unstructured) because the modelling of the
data is codified in the program (schema-on-read). In the large data scale it is difficult to establish a model for all data and there are several issues
related with poor data quality, such as for instance missing data, noise or incorrect data. Another problem is that new raw data are continuously
generated and the data model could change over time, and then the program needs some changes.

Morán et al 3
FIGURE 2 Steps of Systematic Mapping Study.
The balance and the statistical properties of the data can also change over time and they can affect the program, especially if there are per-
formance optimizations in the code based on data property assumptions. For example, suppose that in the program that analyzes the average
temperature per year, the last two years contain 80% of the data. In this case there could be at least two issues: (1) performance problems if these
two years are analyzed in the same computer, and (2) memory leaks or resource issues due to the high quantity of data analyzed by one com-
puter. A further challenge is the type of processing implemented, originally MapReduce analyzed the data only in batches, but nowadays there are
streaming or iterative approaches, among others. For example, the temperature sensors create streams of data, then the calculation of the aver-
age temperature is more efficient using a streaming approach, but it is more difficult to implement and not all programs could be processed in this
way. In some domains it is better to change the <key, value> approach to another that allows to model the program better, such as for example
Pangool
26
that uses tuples, or more complex structures like graphs
27
.
In the main framework of MapReduce, Hadoop, there are a lot of configuration parameters that could affect the execution in terms of resources,
data replications and so on. More than 25 of these parameters are significant in terms of performance
28
. The developer does not know the resources
available when the program is deployed because the cluster continuously changes (new resources adding to scale or infrastructure failures
29
), and
this also makes the optimal configuration difficult. There are other advanced functionalities of MapReduce that could optimize the program, such
as for example the Combine function. The problem is that if these functionalities are not well established there could be some side effects, such as
incorrect output.
Also in Big Data there are other testing issues related to the ethical use of data. Different security procedures and policies should be considered
in the MapReduce programs throughout the data lifecycle. For example, the analysis of some data could be forbidden in the next season due to
agreements with the data provider or legal issues. In other cases, the data should be anonymized or encrypted, especially the sensitive data.
Several generic tools are used in the industry to test the MapReduce programs, such as JUnit
30
with mocks. In order to facilitate the testing of
the MapReduce programs, MRUnit
31
runs the unit test cases without a cluster infrastructure. Another approach is MiniCluster
32
that simulates a
cluster infrastructure in memory, or Herriot
33
that interacts with real infrastructure allowing more grained control, for example by the injection
of computer failures that alter the execution of the program. There are different types of infrastructure failures that affect the test execution
and several tools simplify their injection such as AnarchyApe
34
, ChaosMonkey
35
or Hadoop Injection Framework
36
. The remainder of the paper
analyses and summarizes the efforts of the research studies that are focused on covering the issues related to testing the MapReduce applications.
3 PLANNING OF THE MAPPING STUDY
This mapping study aims to characterize the knowledge of software testing approaches in the MapReduce programs through an empirical study of
the research literature. To avoid bias, the planning of the mapping study describes several tasks based on Kitchenham et al. guidelines
22
:
1. Formulation of the research questions (Subsection 3.1).
2. The search process to extract the significant literature (primary studies) to answer the research questions (Subsection 3.2).
3. Data extraction to obtain the relevant data from the literature (Subsection 3.3).
4. Data synthesis to summarize, mix and put the data into context to answer the questions (Subsection 3.4).
These tasks are planned and then conducted independently as described in Figure 2. The confidence of the results obtained from the planning
of the mapping study is discussed in Subsection 3.5.

4 Morán et al
FIGURE 3 Search process to obtain the primary studies in the mapping study.
3.1 Research Questions
The research questions are formulated to cover all the information about software testing research in the context of the MapReduce programs with
different points of view. This work formulates the research questions based on the 5W+1H model
37,38
, also known as the Kipling method
39
. This
method is used in other software engineering empirical studies
40,41
and answers the questions: Why, What, How, Where, When and Who. The
research questions of this mapping study are:
RQ1. Why is testing performed in the MapReduce programs?
RQ2. What testing is performed in the MapReduce programs?
RQ3. How is testing performed in the MapReduce programs?
RQ4. Who, where and when is testing performed in the MapReduce programs?
3.2 Search Process
The mapping study answers the research questions based on a series of studies that contain relevant information about these questions. These
studies are called primary studies and are obtained through the tasks described in Figure 3. First, the search terms (set of several words/terms)
related to software testing and MapReduce are searched for in different data sources (journals, conferences and electronic databases). The papers
that match these searches together with other studies recommended by experts constitute the potential primary studies. Finally, these studies
are filtered in the study selection in order to obtain only the studies that contain information to answer the research questions. In the following
subsections each of the planning steps is described in detail.
3.2.1 Search Terms
The search terms are obtained from the three points of view proposed by Kitchenham et al.
22
: (1) population that refers to the technologies and
areas related to MapReduce, (2) intervention that are the issues related to software testing, and (3) outcomes that are the improvements obtained
through software testing.
The search terms of this mapping study follow the chain MapReduce technology related terms AND Quality related terms” where:
The MapReduce technology related terms correspond with population and are enumerated in Table 1 with synonyms. The Big Data paradigm
and the MapReduce processing model are surrounded by a lot of buzzwords like other fields such as Cloud computing. For example, Hadoop is
a distributed system that supports the execution of MapReduce programs and non-MapReduce programs, but there are several papers that use
Hadoop and MapReduce words interchangeably in the title. Other relevant papers do not include in the title the MapReduce word, but contain other
words related to the MapReduce/Big Data ecosystem like Hive, PIG or Spark, among others. In order to obtain the maximum relevant literature and
avoid missing some primary studies due to the buzzwords and jargon, a thorough search is performed considering the MapReduce and Big Data
related technologies enumerated in Table 1.
The quality related terms correspond with the Quality (sub)characteristics of ISO/IEC 25010:2008-2011
42
and ISO/IEC 9126-1:2001
43
with
synonyms (outcome), together with other testing terms (intervention). Both are enumerated in Table 2.
This work performs a wide search with 9384 combinations of terms in the paper title, obtained by 92 MapReduce technology related terms and
102 quality related terms.

Morán et al 5
TABLE 1 MapReduce technology related terms (population).
Technology Terms and years of creation
Field Big Data, Massive data, Large data
Data processing Hadoop (2006)
-Batch MapReduce (2004)
-Iterative Spark (2013), Tez (2013), Stratosphere (2010), Dryad (2007), Flink (2014)
-Streaming Storm (2011), S4 (2010), Samza (2013)
-Lambda Lambdoop (2013), Summingbird (2013)
-BSP Giraph (2013), Hama (2011)
-Interactive Drill (2012), Impala (2012)
-MPI Hamster (2011)
Testing MRUnit (2009), Junit (1998), Mock, MiniMRCluster (2006), MiniYarnMRCluster (2012), Mini cluster (2007),
QuerySurge (2011)
Security Sentry (2013), Kerberos (2007), Knox (2013), Argus (2014)
Resource Manager Yarn (2012), Corona (2012), Mesos (2009)
MapReduce abstraction Pig (2008), Hive (2010), Jaql (2008), Pangool (2012), Cascading (2010), Crunch (2011), Mahout (2010), Data fu
(2010)
Yarn frameworks Twill (2013), Reef (2013), Spring (2013)
Yarn integration Slider (2014), Hoya (2013)
Data integration Flume (2010), Sqoop (2009), Scribe (2007), Chukwa (2009), Hiho (2010)
Workflow Oozie (2010), Hamake (2010), Azkaban (2012), Luigi (2012)
Coordinator Zookeeper (2008), Doozerd (2011), Serf (2013), Etcd (2013)
SDK Hue (2010), HDInsight (2012), Hdt (2012)
Serialization Sequence File (2006), Avro (2009), Thrift (2007), Protobuf (2008)
Cluster Management Ambari (2011), StackIQ (2011), Whte elephant (2012), Ganglia (2007), Cloudera manager (2011), Hprof (2007),
MRBench (2008), HiBench (2010), GridMix (2007), PUMA (2012), SWIM (2011)
Filesystem HDFS (2006), S3 (2006), Kafka (2011), GFS (2003), GPFS (2006), CFS (2013)
Other storage HBase (2008), Parquet (2013), Accumulo (2008), Hcatalog (2011)
Cluster deployment Big top (2011), Buildoop (2014), Whirr (2010)
Data Lifecycle Falcon (2013)
3.2.2 Data Sources
The potential primary studies may be in different data sources. This mapping study searches for the studies in the following data sources grouped
in four categories:
a) High impact journals and conferences. The potential studies are obtained through DBLP
44
with the search terms in 31 JCR journals
45
and
53 CORE conferences
46
enumerated in Appendix A. The journals and conferences selected are related to the software testing or Big Data. This
category contains 624 proceedings/volumes from the year of the MapReduce paper (2004) to June 2016.
b) Electronic databases. The search terms are queried in IEEE Xplore
47
, ACM Digital Library
48
, Scopus
49
, Ei Compendex
50
and ISI Web of
Science
51
, that are employed in other mapping studies of software testing
52
.
c) Other journals and conferences. The non-JCR journals and non-CORE conferences related to software testing or Big Data could be a good
source of potential primary studies. This mapping study searches for studies through DBLP
44
with the search terms in the 33 journals and 49
conferences enumerated in Appendix B. This category contains 1687 proceedings/volumes to search.
d) Expert opinions. The three previous categories involve a wide search of software testing studies about MapReduce programs, but other
relevant studies could not be found. The opinion of authors with experience in software testing and MapReduce, together with the other related
mapping study
25
could provide potential primary studies.
This large search is difficult to carry out because the software engineering search engines do not adequately support the mapping studies
searches
53
. To avoid this problem, we created a program that splits the 9384 combinations of search terms in 2346 searches and simulates a
human performing these requests. The potential primary studies are obtained after approximately a week to avoid bans in the search engines due
to a high number of requests.

Citations
More filters
Book
01 Jan 1946

169 citations

Journal ArticleDOI
TL;DR: In this article, the security measures in the context of the development of secure software (SSD) during the study of systematic mapping (SMS) were studied. But only a few of them provide strong evidence for building secure software applications.
Abstract: In the modern digital era, software systems are extensively adapted and have become an integral component of human society. Such wide use of software systems consists of large and more critical data that inevitably needs to be secured. It is imperative to make sure that these software systems not only satisfy the users’ needs or functional requirements, but it is equally important to make sure the security of these software systems. However, recent research shows that many software development methods do not explicitly include software security measures during software development as they move from demand engineering to their final losses. Integrating software security at each stage of the software development life cycle (SDLC) has become an urgent need. Tackling software security, various methods, techniques, and models have been suggested and developed, however, only a few of them provide strong evidence for building secure software applications. The main purpose of this research is to study security measures in the context of the development of secure software (SSD) during the study of systematic mapping (SMS). Based on the inclusion and exclusion criteria, 116 studies were selected. After the data extraction from the selected 116 papers, these were classified based on the quality assessment, software security method, SDLC phases, publication venue, and SWOT analysis. The results indicate that this domain is still immature and sufficient research work needs to be carried out particularly on empirically evaluated solutions.

27 citations

Journal ArticleDOI
TL;DR: This novel agricultural engineering machinery management analysis platform currently has access to 100 million data points, with system access for capacity expansion, and provides fault detection and maintenance examples.
Abstract: With the continuous development of agricultural mechanization and information, agricultural machinery failure analysis has become an important issue. Providing effective agricultural machinery failure analysis for agricultural workers can save considerable time and cost. This paper addresses this issue by designing and producing a platform for data analysis of agricultural engineering machinery. Through the collection and analysis of agricultural engineering machinery floor data and with the help of the Internet and big data technology, physical modeling, charts, and other intuitive forms have been developed to provide information for personnel on the mechanical parts of agricultural engineering machinery, the cause of failures, and maintenance requirements. This novel agricultural engineering machinery management analysis platform currently has access to 100 million data points, with system access for capacity expansion, and provides fault detection and maintenance examples. It has already been applied in multiple farms across many of China's provinces.

25 citations


Cites methods from "Testing MapReduce programs: A syste..."

  • ...So MapReduce is used as part of the data analysis subsystem to build the analysis engine cluster [36]....

    [...]

Journal ArticleDOI
13 Nov 2020
TL;DR: This work focuses on the problem of differential output testing for distributed stream processing systems, that is, checking whether two implementations produce equivalent output streams in response to a given input stream, and proposes an optimal online algorithm for checking this equivalence.
Abstract: High performance architectures for processing distributed data streams, such as Flink, Spark Streaming, and Storm, are increasingly deployed in emerging data-driven computing systems. Exploiting the parallelism afforded by such platforms, while preserving the semantics of the desired computation, is prone to errors, and motivates the development of tools for specification, testing, and verification. We focus on the problem of differential output testing for distributed stream processing systems, that is, checking whether two implementations produce equivalent output streams in response to a given input stream. The notion of equivalence allows reordering of logically independent data items, and the main technical contribution of the paper is an optimal online algorithm for checking this equivalence. Our testing framework is implemented as a library called DiffStream in Flink. We present four case studies to illustrate how our framework can be used to (1) correctly identify bugs in a set of benchmark MapReduce programs, (2) facilitate the development of difficult-to-parallelize high performance applications, and (3) monitor an application for a long period of time with minimal performance overhead.

17 citations

Book ChapterDOI
08 Jun 2020
TL;DR: This paper proposes a set of mutation operators designed for Spark programs characterized by a data flow and data processing operations and shows that mutation operators can contribute to the testing process, in the construction of reliable Spark programs.
Abstract: This paper proposes a mutation testing approach for big data processing programs that follow a data flow model, such as those implemented on top of Apache Spark. Mutation testing is a fault-based technique that relies on fault simulation by modifying programs, to create faulty versions called mutants. Mutant creation is carried on by operators able to simulate specific and well identified faults. A testing process must be able to signal faults within mutants and thereby avoid having ill behaviours within a program. We propose a set of mutation operators designed for Spark programs characterized by a data flow and data processing operations. These operators model changes in the data flow and operations, to simulate faults that take into account Spark program characteristics. We performed manual experiments to evaluate the proposed mutation operators in terms of cost and effectiveness. Thereby, we show that mutation operators can contribute to the testing process, in the construction of reliable Spark programs.

5 citations


Cites background from "Testing MapReduce programs: A syste..."

  • ...There exist only few works on functional testing of big data programs, most of them address testing of programs built using control flow based programming models like MapReduce [20]....

    [...]

  • ...Testing big data processing programs is an open issue that is receiving increasing attention [5,20]....

    [...]

  • ...Most work has focused on performance testing since performance is a major concern in a big data environment given the computational resources required [20]....

    [...]

  • ...The testing of big data processing programs has gained interest as pointed out in [5] and [20]....

    [...]

  • ...Regarding functional testing, few works have been done, most of them being concentrated on MapReduce [20], leaving an open research area for testing big data programs on other models and technologies....

    [...]

References
More filters
Journal ArticleDOI
Jacob Cohen1
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Abstract: CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of measurement obtainable is nominal scaling (Stevens, 1951, pp. 2526), i.e. placement in a set of k unordered categories. Because the categorizing of the units is a consequence of some complex judgment process performed by a &dquo;two-legged meter&dquo; (Stevens, 1958), it becomes important to determine the extent to which these judgments are reproducible, i.e., reliable. The procedure which suggests itself is that of having two (or more) judges independently categorize a sample of units and determine the degree, significance, and

34,965 citations

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations

Journal ArticleDOI
TL;DR: In this paper, the authors examine three methodological questions that are generally applicable to all qualitative methods: how should the usual scientific canons be reinterpreted for qualitative research? How should researchers report the procedures and canons used in their research? What evaluative criteria should be used in judging the research products?
Abstract: Using grounded theory as an example, this paper examines three methodological questions that are generally applicable to all qualitative methods. How should the usual scientific canons be reinterpreted for qualitative research? How should researchers report the procedures and canons used in their research? What evaluative criteria should be used in judging the research products? We propose that the criteria should be adapted to fit the procedures of the method. We demonstrate how this can be done for grounded theory and suggest criteria for evaluating studies following this approach. We argue that other qualitative researchers might be similarly specific about their procedures and evaluative criteria.

9,564 citations

Journal ArticleDOI
TL;DR: While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
Abstract: The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.

9,097 citations

Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "Testing mapreduce programs: a systematic mapping study" ?

This paper characterizes their state-of-the-art identifying the trends and gaps through a mapping study. The principal reasons for testing the programs are performance issues, potential failures, issues related to the data, or to satisfy the agreements with efficient resources. Despite the fact that functionality is relevant to satisfy the business requirements, few studies are focused on functional testing and can indicate a potential research challenge. 

The non-JCR journals and non-CORE conferences related to software testing or Big Data could be a good source of potential primary studies. 

TheMRFlow testing technique 116 generates the test coverage items that can be used to generate the test inputs based on the data-flow technique adapted to theMapReduce processing model. 

The data processed by these functions are in the form of <key, value> pairs in which the key is the identifier of each subproblem and the value contains the information needed to solve it. 

The most frequent reasons are based on performance issues (analyze, optimize and fulfil performance goals), existence of several and specific failures, the type and quality of the data processed by these programs, and testing to predict and select efficiently the resources. 

Then the prediction models can be designed with more standardized subset of parameters that have a notorious influence in performance. 

When the programs executed in the public cloud have an deadline requirements to satisfy, the performance can be predicted with Locally Weighted Linear Regression model considering the previous execution and the data executed in parallel 91. 

In the case of an I/O intensive programs in cloud, the performance can be predicted using a CART (Classification And Regression Tree) model 90. 

The potential primary studies are obtained after approximately a week to avoid bans in the search engines due to a high number of requests. 

The performance of the MapReduce and Big Data applications can also be evaluated through large scale stochastic models by Mean Field Analysis 77. 

There are several technologies to execute and manage MapReduce programs such as Spark 3, Flink 4 and Hadoop 5, all broadly implemented in industry 6.