scispace - formally typeset

Proceedings ArticleDOI

Getting ready for BigData testing: A practitioner's perception

04 Jul 2013-pp 1-5

TL;DR: Two specialized testings are considered to learn the intricacies of Big Test Data Management, and thoughts on Big Test data Management are also presented.
Abstract: Big Data is already there and developers and analytics are working with it using several support upcoming frameworks and technologies. The storage and retrieval systems, the access layers and processes for Big Data are evolving day by day. Test Architects and Testing teams are not excluded in this Big scenario. This literature focuses on some of the challenges test teams would be facing in the near future. Two specialized testings are considered to learn the intricacies, and thoughts on Big Test Data Management are also presented.
Topics: Analytics (61%), Big data (60%), Test data (58%), Data warehouse (55%), Test strategy (53%)
Citations
More filters

Journal ArticleDOI
TL;DR: New testing techniques that aimed to detect design faults by simulating different infrastructure configurations that as whole are more likely to reveal failures using random testing, and partition testing together with combinatorial testing are proposed.
Abstract: New processing models are being adopted in Big Data engineering to overcome the limitations of traditional technology. Among them, MapReduce stands out by allowing for the processing of large volumes of data over a distributed infrastructure that can change during runtime. The developer only designs the functionality of the program and its execution is managed by a distributed system. As a consequence, a program can behave differently at each execution because it is automatically adapted to the resources available at each moment. Therefore, when the program has a design fault, this could be revealed in some executions and masked in others. However, during testing, these faults are usually masked because the test infrastructure is stable, and they are only revealed in production because the environment is more aggressive with infrastructure failures, among other reasons. This paper proposes new testing techniques that aimed to detect these design faults by simulating different infrastructure configurations. The testing techniques generate a representative set of infrastructure configurations that as whole are more likely to reveal failures using random testing, and partition testing together with combinatorial testing. The techniques are automated by using a test execution engine called MRTest that is able to detect these faults using only the test input data, regardless of the expected output. Our empirical evaluation shows that MRTest can automatically detect these design faults within a reasonable time.

12 citations


Cites background from "Getting ready for BigData testing: ..."

  • ...In recent years, this field has seen great progress [35], but concerning the testing of Big Data applications, there remain several challenges [36], [37]....

    [...]


Proceedings ArticleDOI
30 Aug 2015
TL;DR: This paper proposes the testing technique called MRFlow that is based on data flow test criteria and oriented to transformations analysis between the input and the output in order to detect defects in MapReduce programs.
Abstract: MapReduce is a parallel data processing paradigm oriented to process large volumes of information in data-intensive applications, such as Big Data environments. A characteristic of these applications is that they can have different data sources and data formats. For these reasons, the inputs could contain some poor quality data that could produce a failure if the program functionality does not handle properly the variety of input data. The output of these programs is obtained from a number of input transformations that represent the program logic. This paper proposes the testing technique called MRFlow that is based on data flow test criteria and oriented to transformations analysis between the input and the output in order to detect defects in MapReduce programs. MRFlow is applied over some MapReduce programs and detects several defects.

10 citations


Cites methods from "Getting ready for BigData testing: ..."

  • ...Keywords Software Testing, Data Flow Testing, MapReduce programs....

    [...]


Journal ArticleDOI
TL;DR: MapReduce is a processing model used in Big Data to facilitate the analysis of large data under a distributed architecture that simplifies the management of large amounts of data.
Abstract: Trabajo apoyado en parte por los proyectos TIN2016‐76956 ‐ C3‐1 ‐ R y TIN2013‐46928 ‐ C3‐1, financiado por el Ministerio de Economia y Competitividad de Espana, y GRUPIN14‐007, financiado por el Principado de Asturias (Espana) y FEDER

7 citations


Proceedings ArticleDOI
25 Jul 2017
TL;DR: This work proposes an automatic test framework implementing a novel testing approach called Ex Vivo that can identify a fault in a few seconds, then the program can be stopped, not only avoiding an incorrect output, but also saving money, time and energy of production resources.
Abstract: Big Data programs are those that process large data exceeding the capabilities of traditional technologies. Among newly proposed processing models, MapReduce stands out as it allows the analysis of schema-less data in large distributed environments with frequent infrastructure failures. Functional faults in MapReduce are hard to detect in a testing/preproduction environment due to its distributed characteristics. We propose an automatic test framework implementing a novel testing approach called Ex Vivo. The framework employs data from production but executes the tests in a laboratory to avoid side-effects on the application. Faults are detected automatically without human intervention by checking if the same data would generate different outputs with different infrastructure configurations. The framework (MrExist) is validated with a real-world program. MrExist can identify a fault in a few seconds, then the program can be stopped, not only avoiding an incorrect output, but also saving money, time and energy of production resources.

5 citations


Cites background from "Getting ready for BigData testing: ..."

  • ...This field has experimented great progress in recent years [24], but there are still some challenges to test Big Data programs [25], [26]....

    [...]


Proceedings ArticleDOI
Jesus Moran1, Bibiano Rivas2, Claudio de la Riva1, Javier Tuya1  +2 moreInstitutions (2)
01 Aug 2016
TL;DR: A testing technique is proposed to generate different infrastructure configurations for a given test input data, and then the program is executed in these configurations in order to reveal functional faults.
Abstract: Programs that process a large volume of data generally run in a distributed and parallel architecture, such as the programs implemented in the processing model MapReduce. In these programs, developers can abstract the infrastructure where the program will run and focus on the functional issues. However, the infrastructure configuration and its state cause different parallel executions of the program and some could derive in functional faults which are hard to reveal. In general, the infrastructure that executes the program is not considered during the testing, because the tests usually contain few input data and then the parallelization is not necessary. In this paper a testing technique is proposed to generate different infrastructure configurations for a given test input data, and then the program is executed in these configurations in order to reveal functional faults. This testing technique is automatized by using a test engine and is applied to a case study. As a result, several infrastructure configurations are automatically generated and executed for a test case revealing a functional fault that is then fixed by the developer.

4 citations


Cites methods from "Getting ready for BigData testing: ..."

  • ...Despite the testing challenges of the Big Data applications [15, 16] and the progresses in the testing techniques [17], little effort is focused on testing the MapReduce programs [18], one of the principal paradigms of Big Data [19]....

    [...]


References
More filters

01 Jan 2015
Abstract: The Workshop was hosted by The Law and Technology Centre of the Faculty of Law, The University of Hong Kong

530 citations


01 Jan 2012
TL;DR: Preliminary infrastructure tuning results in sorting 1TB data in 14 minutes 1 on 10 Power 730 machines running IBM InfoSphere BigInsights and further improvement is expected, among other factors, on the new IBM PowerLinux TM 7R2 systems.
Abstract: The use of Big Data underpins critical activities in all sectors of our society. Achieving the full transformative potential of Big Data in this increasingly digital world requires both new data analysis algorithms and a new class of systems to handle the dramatic data growth, the demand to integrate structured and unstructured data analytics, and the increasing computing needs of massive-scale analytics. In this paper, we discuss several Big Data research activities at IBM Research: (1) Big Data benchmarking and methodology; (2) workload optimized systems for Big Data; (3) case study of Big Data workloads on IBM Power systems. In (3), we show that preliminary infrastructure tuning results in sorting 1TB data in 14 minutes 1 on 10 Power 730 machines running IBM InfoSphere BigInsights. Further improvement is expected, among other factors, on the new IBM PowerLinux TM 7R2 systems.

9 citations


"Getting ready for BigData testing: ..." refers background in this paper

  • ...Volume is the enormity of data, variety is the heterogeneity of data, velocity is the rate of transfer (speed) of data that comes in, flows within and goes out, and veracity is the truthiness of the data or information [1]....

    [...]


Network Information
Related Papers (5)
Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20191
20181
20171
20162
20151