Proceedings ArticleDOI
New ideas track: testing mapreduce-style programs
Christoph Csallner,Leonidas Fegaras,Chengkai Li +2 more
- pp 504-507
Reads0
Chats0
TLDR
A novel technique is presented that systematically searches for bugs in Map Reduce applications and generates corresponding test cases by encoding the high-level MapReduce correctness conditions as symbolic program constraints and checking them for the program under test.Abstract:
MapReduce has become a common programming model for processing very large amounts of data, which is needed in a spectrum of modern computing applications. Today several MapReduce implementations and execution systems exist and many MapReduce programs are being developed and deployed in practice. However, developing MapReduce programs is not always an easy task. The programming model makes programs prone to several MapReduce-specific bugs. That is, to produce deterministic results, a MapReduce program needs to satisfy certain high-level correctness conditions. A violating program may yield different output values on the same input data, based on low-level infrastructure events such as network latency, scheduling decisions, etc. Current MapReduce systems and tools are lacking in support for checking these conditions and reporting violations.This paper presents a novel technique that systematically searches for such bugs in MapReduce applications and generates corresponding test cases. The technique works by encoding the high-level MapReduce correctness conditions as symbolic program constraints and checking them for the program under test. To the best of our knowledge, this is the first approach to addressing this problem of MapReduce-style programming.read more
Citations
More filters
Proceedings ArticleDOI
Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs
Tian Xiao,Jiaxing Zhang,Hucheng Zhou,Zhenyu Guo,Sean McDirmid,Wei Lin,Wenguang Chen,Lidong Zhou +7 more
TL;DR: Although non-commutative reduce functions lead to five bugs in a sample of well-tested production programs, it is surprisingly found that many non- Commutativity reduce functions are mostly harmless due to, for example, implicit data properties.
Journal ArticleDOI
Spotting Code Optimizations in Data-Parallel Pipelines through PeriSCOPE
Xuepeng Fan,Zhenyu Guo,Hai Jin,Xiaofei Liao,Jiaxing Zhang,Hucheng Zhou,Sean McDirmid,Wei Lin,Jingren Zhou,Lidong Zhou +9 more
TL;DR: PeriSCOPE is described, which automatically optimizes a data-parallel program's procedural code in the context of data flow that is reconstructed from the program's pipeline topology, and leverages symbolic execution to enlarge the scope of such optimizations by eliminating dead code.
Book ChapterDOI
Commutativity of Reducers
TL;DR: This paper introduces a syntactic subset of integer programs termed integer reducers to model real-world reducers and shows that checking commutativity ofinteger reducers over unbounded lists of exact integers is undecidable.
Journal ArticleDOI
Method for testing the fault tolerance of MapReduce frameworks
TL;DR: A method to create a set of fault cases, derived from a Petri net (PN), and a framework to automate the execution of these fault cases in a distributed system to provide network reliability enhancements as a byproduct.
Proceedings ArticleDOI
Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop
TL;DR: This paper proposes a holistic approach to verify the correctness of hadoop systems using model checking techniques and model Hadoop's parallel architecture to constraint it to valid start up ordering and identify and prove the benefits of data locality, deadlock-freeness and non-termination among others.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Book
Hadoop: The Definitive Guide
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Proceedings ArticleDOI
Dryad: distributed data-parallel programs from sequential building blocks
TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Journal ArticleDOI
DART: directed automated random testing
TL;DR: DART is a new tool for automatically testing software that combines three main techniques, automated extraction of the interface of a program with its external environment using static source-code parsing, and dynamic analysis of how the program behaves under random testing and automatic generation of new test inputs to direct systematically the execution along alternative program paths.