New ideas track: testing mapreduce-style programs

doi:10.1145/2025113.2025204

Proceedings ArticleDOI

New ideas track: testing mapreduce-style programs

Christoph Csallner, +2 more

- pp 504-507

Chats0

TLDR

A novel technique is presented that systematically searches for bugs in Map Reduce applications and generates corresponding test cases by encoding the high-level MapReduce correctness conditions as symbolic program constraints and checking them for the program under test.

Abstract:

MapReduce has become a common programming model for processing very large amounts of data, which is needed in a spectrum of modern computing applications. Today several MapReduce implementations and execution systems exist and many MapReduce programs are being developed and deployed in practice. However, developing MapReduce programs is not always an easy task. The programming model makes programs prone to several MapReduce-specific bugs. That is, to produce deterministic results, a MapReduce program needs to satisfy certain high-level correctness conditions. A violating program may yield different output values on the same input data, based on low-level infrastructure events such as network latency, scheduling decisions, etc. Current MapReduce systems and tools are lacking in support for checking these conditions and reporting violations.This paper presents a novel technique that systematically searches for such bugs in MapReduce applications and generates corresponding test cases. The technique works by encoding the high-level MapReduce correctness conditions as symbolic program constraints and checking them for the program under test. To the best of our knowledge, this is the first approach to addressing this problem of MapReduce-style programming.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs

Tian Xiao, +7 more

TL;DR: Although non-commutative reduce functions lead to five bugs in a sample of well-tested production programs, it is surprisingly found that many non- Commutativity reduce functions are mostly harmless due to, for example, implicit data properties.

...read moreread less

Journal ArticleDOI

Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

Xuepeng Fan, +9 more

- 01 Jun 2015 -

IEEE Transactions on Parallel and Distri...

TL;DR: PeriSCOPE is described, which automatically optimizes a data-parallel program's procedural code in the context of data flow that is reconstructed from the program's pipeline topology, and leverages symbolic execution to enlarge the scope of such optimizations by eliminating dead code.

...read moreread less

Book ChapterDOI

Commutativity of Reducers

Yu-Fang Chen, +3 more

TL;DR: This paper introduces a syntactic subset of integer programs termed integer reducers to model real-world reducers and shows that checking commutativity ofinteger reducers over unbounded lists of exact integers is undecidable.

...read moreread less

Journal ArticleDOI

Method for testing the fault tolerance of MapReduce frameworks

João Eugenio Marynowski, +3 more

- 05 Jul 2015 -

Computer Networks

TL;DR: A method to create a set of fault cases, derived from a Petri net (PN), and a framework to automate the execution of these fault cases in a distributed system to provide network reliability enhancements as a byproduct.

...read moreread less

Proceedings ArticleDOI

Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop

G. Satya Reddy, +5 more

TL;DR: This paper proposes a holistic approach to verify the correctness of hadoop systems using model checking techniques and model Hadoop's parallel architecture to constraint it to valid start up ordering and identify and prove the benefits of data locality, deadlock-freeness and non-termination among others.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Book

Hadoop: The Definitive Guide

Tom White

TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.

...read moreread less

Proceedings ArticleDOI

Dryad: distributed data-parallel programs from sequential building blocks

Michael Isard, +4 more

TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.

...read moreread less

Journal ArticleDOI

DART: directed automated random testing

Patrice Godefroid, +2 more

TL;DR: DART is a new tool for automatically testing software that combines three main techniques, automated extraction of the interface of a program with its external environment using static source-code parsing, and dynamic analysis of how the program behaves under random testing and automatic generation of new test inputs to direct systematically the execution along alternative program paths.

...read moreread less

New ideas track: testing mapreduce-style programs

Citations

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs

Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

Commutativity of Reducers

Method for testing the fault tolerance of MapReduce frameworks

Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

Hadoop: The Definitive Guide

Dryad: distributed data-parallel programs from sequential building blocks

DART: directed automated random testing

Related Papers (5)

MapReduce: simplified data processing on large clusters

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs

Profiling, what-if analysis, and cost-based optimization of MapReduce programs

The survey on mapreduce

Survey of MapReduce Parallel Programming Model