Showing papers by "Joseph M. Hellerstein published in 2015"

PDF

Open Access

Proceedings Article•

Predictive Interaction for Data Transformation

[...]

Jeffrey Heer¹, Joseph M. Hellerstein², Sean Kandel³•Institutions (3)

University of Washington¹, University of California, Berkeley², Stanford University³

01 Jan 2015

TL;DR: Predictive Interaction is presented, a framework for interactive systems that shifts the burden of technical specification from users to algorithms, while preserving human guidance and expressive power.

...read moreread less

Abstract: The human work involved in data transformation represents a major bottleneck for today’s data-driven organizations. In response, we present Predictive Interaction, a framework for interactive systems that shifts the burden of technical specification from users to algorithms, while preserving human guidance and expressive power.

...read moreread less

94 citations

Proceedings Article•DOI•

Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

[...]

Peter Bailis¹, Alan Fekete², Michael J. Franklin¹, Ali Ghodsi¹, Joseph M. Hellerstein¹, Ion Stoica¹ - Show less +2 more•Institutions (2)

University of California, Berkeley¹, University of Sydney²

27 May 2015

TL;DR: This work quantitatively analyzes the use of feral mechanisms for maintaining database integrity in a range of open source applications written using the Ruby on Rails ORM and finds that feral invariants are the most popular means of ensuring integrity.

...read moreread less

Abstract: The rise of data-intensive "Web 2.0" Internet services has led to a range of popular new programming frameworks that collectively embody the latest incarnation of the vision of Object-Relational Mapping (ORM) systems, albeit at unprecedented scale. In this work, we empirically investigate modern ORM-backed applications' use and disuse of database concurrency control mechanisms. Specifically, we focus our study on the common use of feral, or application-level, mechanisms for maintaining database integrity, which, across a range of ORM systems, often take the form of declarative correctness criteria, or invariants. We quantitatively analyze the use of these mechanisms in a range of open source applications written using the Ruby on Rails ORM and find that feral invariants are the most popular means of ensuring integrity (and, by usage, are over 37 times more popular than transactions). We evaluate which of these feral invariants actually ensure integrity (by usage, up to 86.9%) and which---due to concurrency errors and lack of database support---may lead to data corruption (the remainder), which we experimentally quantify. In light of these findings, we present recommendations for database system designers for better supporting these modern ORM programming patterns, thus eliminating their adverse effects on application integrity.

...read moreread less

66 citations

Proceedings Article•DOI•

Lineage-driven Fault Injection

[...]

Peter Alvaro¹, Joshua Rosen¹, Joseph M. Hellerstein¹•Institutions (1)

University of California, Berkeley¹

27 May 2015

TL;DR: MOLLY is presented, a prototype of lineage-driven fault injection that exploits a novel combination of data lineage techniques from the database literature and state-of-the-art satisfiability testing that finds bugs in fault-tolerant data management systems rapidly.

...read moreread less

Abstract: In large-scale data management systems, failure is practically a certainty. Fault-tolerant protocols and components are notoriously difficult to implement and debug. Worse still, choosing existing fault-tolerance mechanisms and integrating them correctly into complex systems remains an art form, and programmers have few tools to assist them. We propose a novel approach for discovering bugs in fault-tolerant data management systems: lineage-driven fault injection. A lineage-driven fault injector reasons backwards from correct system outcomes to determine whether failures in the execution could have prevented the outcome. We present MOLLY, a prototype of lineage-driven fault injection that exploits a novel combination of data lineage techniques from the database literature and state-of-the-art satisfiability testing. If fault-tolerance bugs exist for a particular configuration, MOLLY finds them rapidly, in many cases using a order of magnitude fewer executions than random fault injection. Otherwise, MOLLY certifies that the code is bug-free for that configuration.

...read moreread less

60 citations

Posted Content•

Asynchronous Complex Analytics in a Distributed Dataflow Architecture

[...]

Joseph E. Gonzalez, Peter Bailis, Michael I. Jordan, Michael J. Franklin, Joseph M. Hellerstein, Ali Ghodsi, Ion Stoica - Show less +3 more

24 Oct 2015-arXiv: Databases

TL;DR: This work investigates the use of asynchronous sideways information passing (ASIP) that presents single-stage parallel iterators with a Volcano-like intra-operator iterator that can be used for asynchronous information passing and evaluates an implementation of ASIPs within on Apache Spark that exhibits considerable speedups as well as a rich set of performance trade-offs.

...read moreread less

Abstract: Scalable distributed dataflow systems have recently experienced widespread adoption, with commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines routinely supporting increasingly sophisticated analytics tasks (e.g., support vector machines, logistic regression, collaborative filtering). However, these systems' synchronous (often Bulk Synchronous Parallel) dataflow execution model is at odds with an increasingly important trend in the machine learning community: the use of asynchrony via shared, mutable state (i.e., data races) in convex programming tasks, which has---in a single-node context---delivered noteworthy empirical performance gains and inspired new research into asynchronous algorithms. In this work, we attempt to bridge this gap by evaluating the use of lightweight, asynchronous state transfer within a commodity dataflow engine. Specifically, we investigate the use of asynchronous sideways information passing (ASIP) that presents single-stage parallel iterators with a Volcano-like intra-operator iterator that can be used for asynchronous information passing. We port two synchronous convex programming algorithms, stochastic gradient descent and the alternating direction method of multipliers (ADMM), to use ASIPs. We evaluate an implementation of ASIPs within on Apache Spark that exhibits considerable speedups as well as a rich set of performance trade-offs in the use of these asynchronous algorithms.

...read moreread less

14 citations

Journal Article•DOI•

Putting Logic-Based Distributed Systems on Stable Grounds

[...]

Tom J. Ameloot, Jan Van den Bussche, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein - Show less +1 more

20 Jul 2015-arXiv: Logic in Computer Science

TL;DR: In this paper, a declarative, model-based semantics for Datalog with negation is proposed, which is based on the well-known stable model semantics for the language.

...read moreread less

Abstract: In the Declarative Networking paradigm, Datalog-like languages are used to express distributed computations. Whereas recently formal operational semantics for these languages have been developed, a corresponding declarative semantics has been lacking so far. The challenge is to capture precisely the amount of nondeterminism that is inherent to distributed computations due to concurrency, networking delays, and asynchronous communication. This paper shows how a declarative, model-based semantics can be obtained by simply using the well-known stable model semantics for Datalog with negation. We show that the model-based semantics matches previously proposed formal operational semantics.

...read moreread less

4 citations