scispace - formally typeset
Proceedings ArticleDOI

New ideas track: testing mapreduce-style programs

Reads0
Chats0
TLDR
A novel technique is presented that systematically searches for bugs in Map Reduce applications and generates corresponding test cases by encoding the high-level MapReduce correctness conditions as symbolic program constraints and checking them for the program under test.
Abstract
MapReduce has become a common programming model for processing very large amounts of data, which is needed in a spectrum of modern computing applications. Today several MapReduce implementations and execution systems exist and many MapReduce programs are being developed and deployed in practice. However, developing MapReduce programs is not always an easy task. The programming model makes programs prone to several MapReduce-specific bugs. That is, to produce deterministic results, a MapReduce program needs to satisfy certain high-level correctness conditions. A violating program may yield different output values on the same input data, based on low-level infrastructure events such as network latency, scheduling decisions, etc. Current MapReduce systems and tools are lacking in support for checking these conditions and reporting violations.This paper presents a novel technique that systematically searches for such bugs in MapReduce applications and generates corresponding test cases. The technique works by encoding the high-level MapReduce correctness conditions as symbolic program constraints and checking them for the program under test. To the best of our knowledge, this is the first approach to addressing this problem of MapReduce-style programming.

read more

Citations
More filters
Proceedings ArticleDOI

Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs

TL;DR: Although non-commutative reduce functions lead to five bugs in a sample of well-tested production programs, it is surprisingly found that many non- Commutativity reduce functions are mostly harmless due to, for example, implicit data properties.
Journal ArticleDOI

Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE

TL;DR: PeriSCOPE is described, which automatically optimizes a data-parallel program's procedural code in the context of data flow that is reconstructed from the program's pipeline topology, and leverages symbolic execution to enlarge the scope of such optimizations by eliminating dead code.
Book ChapterDOI

Commutativity of Reducers

TL;DR: This paper introduces a syntactic subset of integer programs termed integer reducers to model real-world reducers and shows that checking commutativity ofinteger reducers over unbounded lists of exact integers is undecidable.
Journal ArticleDOI

Method for testing the fault tolerance of MapReduce frameworks

TL;DR: A method to create a set of fault cases, derived from a Petri net (PN), and a framework to automate the execution of these fault cases in a distributed system to provide network reliability enhancements as a byproduct.
Proceedings ArticleDOI

Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop

TL;DR: This paper proposes a holistic approach to verify the correctness of hadoop systems using model checking techniques and model Hadoop's parallel architecture to constraint it to valid start up ordering and identify and prove the benefits of data locality, deadlock-freeness and non-termination among others.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Book

Hadoop: The Definitive Guide

Tom White
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Proceedings ArticleDOI

Dryad: distributed data-parallel programs from sequential building blocks

TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Journal ArticleDOI

DART: directed automated random testing

TL;DR: DART is a new tool for automatically testing software that combines three main techniques, automated extraction of the interface of a program with its external environment using static source-code parsing, and dynamic analysis of how the program behaves under random testing and automatic generation of new test inputs to direct systematically the execution along alternative program paths.