scispace - formally typeset
Search or ask a question

Showing papers presented at "Conference on Object-Oriented Programming Systems, Languages, and Applications in 2014"


Proceedings ArticleDOI
15 Oct 2014
TL;DR: This paper identifies failure-atomic sections of code based on existing critical sections and describes a log-based implementation that can be used to recover a consistent state after a failure, and confirms the ability to rapidly flush CPU caches as a core implementation bottleneck and suggest partial solutions.
Abstract: Non-volatile main memory, such as memristors or phase change memory, can revolutionize the way programs persist data. In-memory objects can themselves be persistent without the need for a separate persistent data storage format. However, the challenge is to ensure that such data remains consistent if a failure occurs during execution. In this paper, we present our system, called Atlas, which adds durability semantics to lock-based code, typically allowing us to automatically maintain a globally consistent state even in the presence of failures. We identify failure-atomic sections of code based on existing critical sections and describe a log-based implementation that can be used to recover a consistent state after a failure. We discuss several subtle semantic issues and implementation tradeoffs. We confirm the ability to rapidly flush CPU caches as a core implementation bottleneck and suggest partial solutions. Experimental results confirm the practicality of our approach and provide insight into the overheads of such a system.

271 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: Chisel as discussed by the authors is a system for reliability and accuracy-aware optimization of approximate computational kernels that run on approximate hardware platforms, given a combined reliability and/or accuracy specification, automatically selects approximate kernel operations to synthesize an approximate computation that minimizes energy consumption.
Abstract: The accuracy of an approximate computation is the distance between the result that the computation produces and the corresponding fully accurate result. The reliability of the computation is the probability that it will produce an acceptably accurate result. Emerging approximate hardware platforms provide approximate operations that, in return for reduced energy consumption and/or increased performance, exhibit reduced reliability and/or accuracy. We present Chisel, a system for reliability- and accuracy-aware optimization of approximate computational kernels that run on approximate hardware platforms. Given a combined reliability and/or accuracy specification, Chisel automatically selects approximate kernel operations to synthesize an approximate computation that minimizes energy consumption while satisfying its reliability and accuracy specification. We evaluate Chisel on five applications from the image processing, scientific computing, and financial analysis domains. The experimental results show that our implemented optimization algorithm enables Chisel to optimize our set of benchmark kernels to obtain energy savings from 8.7% to 19.8% compared to the fully reliable kernel implementations while preserving important reliability guarantees.

129 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: The ALL(*) parsing strategy is introduced, which combines the simplicity, efficiency, and predictability of conventional top-down LL(k) parsers with the power of a GLR-like mechanism to make parsing decisions.
Abstract: Despite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/or unpredictable performance, and counter-intuitive matching strategies. This paper introduces the ALL(*) parsing strategy that combines the simplicity, efficiency, and predictability of conventional top-down LL(k) parsers with the power of a GLR-like mechanism to make parsing decisions. The critical innovation is to move grammar analysis to parse-time, which lets ALL(*) handle any non-left-recursive context-free grammar. ALL(*) is O(n4) in theory but consistently performs linearly on grammars used in practice, outperforming general strategies such as GLL and GLR by orders of magnitude. ANTLR 4 generates ALL(*) parsers and supports direct left-recursion through grammar rewriting. Widespread ANTLR 4 use (5000 downloads/month in 2013) provides evidence that ALL(*) is effective for a wide variety of applications.

114 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: GPS is introduced, the first program logic to provide a full-fledged suite of modern verification techniques - including ghost state, protocols, and separation logic - for high-level, structured reasoning about weak memory.
Abstract: Weak memory models formalize the inconsistent behaviors that one can expect to observe in multithreaded programs running on modern hardware. In so doing, however, they complicate the already-difficult task of reasoning about correctness of concurrent code. Worse, they render impotent the sophisticated formal methods that have been developed to tame concurrency, which almost universally assume a strong (i.e. sequentially consistent) memory model. This paper introduces GPS, the first program logic to provide a full-fledged suite of modern verification techniques - including ghost state, protocols, and separation logic - for high-level, structured reasoning about weak memory. We demonstrate the effectiveness of GPS by applying it to challenging examples drawn from the Linux kernel as well as lock-free data structures. We also define the semantics of GPS and prove in Coq that it is sound with respect to the axiomatic C11 weak memory model.

113 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: This work presents a static dataflow analysis for JavaScript that infers and exploits determinacy information on-the-fly, to enable analysis of some of the most complex parts of jQuery.
Abstract: Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental consequences on the analysis precision and performance. Previous work on dynamic determinacy analysis has demonstrated how information about program expressions that always resolve to a fixed value in some call context may lead to significant scalability improvements of static analysis for such code. We present a static dataflow analysis for JavaScript that infers and exploits determinacy information on-the-fly, to enable analysis of some of the most complex parts of jQuery. The analysis combines selective context and path sensitivity, constant propagation, and branch pruning, based on a systematic investigation of the main causes of analysis imprecision when using a more basic analysis. The techniques are implemented in the TAJS analysis tool and evaluated on a collection of small programs that use jQuery. Our results show that the proposed analysis techniques boost both precision and performance, specifically for inferring type information and call graphs.

100 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: This paper presents an empirical study aiming to illuminate the relationship between the choices and settings of thread management constructs and energy consumption, and sheds light on the energy/performance trade-off of three ``tuning knobs'' of these constructs.
Abstract: Java programmers are faced with numerous choices in managing concurrent execution on multicore platforms. These choices often have different trade-offs (e.g., performance, scalability, and correctness guarantees). This paper analyzes an additional dimension, energy consumption. It presents an empirical study aiming to illuminate the relationship between the choices and settings of thread management constructs and energy consumption. We consider three important thread management constructs in concurrent programming: explicit thread creation, fixed-size thread pooling, and work stealing. We further shed light on the energy/performance trade-off of three ``tuning knobs'' of these constructs: the number of threads, the task division strategy, and the characteristics of processed data. Through an extensive experimental space exploration over real-world Java programs, we produce a list of findings about the energy behaviors of concurrent programs, which are not always obvious. The study serves as a first step toward improving energy efficiency of concurrent programs on parallel architectures.

84 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: This study conducts an empirical study to understand how performance problems are observed and reported by real-world users and shows that statistical debugging is a natural fit for diagnosing performance problems, which are often observed through comparison-based approaches and reported together with both good and bad inputs.
Abstract: Design and implementation defects that lead to inefficient computation widely exist in software. These defects are difficult to avoid and discover. They lead to severe performance degradation and energy waste during production runs, and are becoming increasingly critical with the meager increase of single-core hardware performance and the increasing concerns about energy constraints. Effective tools that diagnose performance problems and point out the inefficiency root cause are sorely needed. The state of the art of performance diagnosis is preliminary. Profiling can identify the functions that consume the most computation resources, but can neither identify the ones that waste the most resources nor explain why. Performance-bug detectors can identify specific type of inefficient computation, but are not suited for diagnosing general performance problems. Effective failure diagnosis techniques, such as statistical debugging, have been proposed for functional bugs. However, whether they work for performance problems is still an open question. In this paper, we first conduct an empirical study to understand how performance problems are observed and reported by real-world users. Our study shows that statistical debugging is a natural fit for diagnosing performance problems, which are often observed through comparison-based approaches and reported together with both good and bad inputs. We then thoroughly investigate different design points in statistical debugging, including three different predicates and two different types of statistical models, to understand which design point works the best for performance diagnosis. Finally, we study how some unique nature of performance bugs allows sampling techniques to lower the overhead of run-time performance diagnosis without extending the diagnosis latency.

84 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: Phosphor is presented, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves the goals of performance, soundness, precision, and portability, and is the first portable general purpose taint tracker for the JVM.
Abstract: Dynamic taint analysis is a well-known information flow analysis problem with many possible applications. Taint tracking allows for analysis of application data flow by assigning labels to data, and then propagating those labels through data flow. Taint tracking systems traditionally compromise among performance, precision, soundness, and portability. Performance can be critical, as these systems are often intended to be deployed to production environments, and hence must have low overhead. To be deployed in security-conscious settings, taint tracking must also be sound and precise. Dynamic taint tracking must be portable in order to be easily deployed and adopted for real world purposes, without requiring recompilation of the operating system or language interpreter, and without requiring access to application source code. We present Phosphor, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves our goals of performance, soundness, precision, and portability. Moreover, to our knowledge, it is the first portable general purpose taint tracking system for the JVM. We evaluated Phosphor's performance on two commonly used JVM languages (Java and Scala), on two successive revisions of two commonly used JVMs (Oracle's HotSpot and OpenJDK's IcedTea) and on Android's Dalvik Virtual Machine, finding its performance to be impressive: as low as 3% (53% on average; 220% at worst) using the DaCapo macro benchmark suite. This paper describes our approach toward achieving portable taint tracking in the JVM.

76 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: A relaxed memory consistency model and consistency protocol that simultaneously tolerate communication latency and minimize the use of stale values is designed and demonstrated that for a range of asynchronous graph algorithms and PDE solvers, this approach outperforms algorithms based upon prior relaxed memory models.
Abstract: Many vertex-centric graph algorithms can be expressed using asynchronous parallelism by relaxing certain read-after-write data dependences and allowing threads to compute vertex values using stale (i.e., not the most recent) values of their neighboring vertices. We observe that on distributed shared memory systems, by converting synchronous algorithms into their asynchronous counterparts, algorithms can be made tolerant to high inter-node communication latency. However, high inter-node communication latency can lead to excessive use of stale values causing an increase in the number of iterations required by the algorithms to converge. Although by using bounded staleness we can restrict the slowdown in the rate of convergence, this also restricts the ability to tolerate communication latency. In this paper we design a relaxed memory consistency model and consistency protocol that simultaneously tolerate communication latency and minimize the use of stale values. This is achieved via a coordinated use of best effort refresh policy and bounded staleness. We demonstrate that for a range of asynchronous graph algorithms and PDE solvers, on an average, our approach outperforms algorithms based upon: prior relaxed memory models that allow stale values by at least 2.27x; and Bulk Synchronous Parallel (BSP) model by 4.2x. We also show that our approach frequently outperforms GraphLab, a popular distributed graph processing framework.

66 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: This work explores the design space bottom-up, teasing apart inherent from accidental complexities, while fully mechanizing the authors' models at each step, and presents DOT, which captures the essence - DOT stands for Dependent Object Types.
Abstract: A scalable programming language is one in which the same concepts can describe small as well as large parts. Towards this goal, Scala unifies concepts from object and module systems. An essential ingredient of this unification is the concept of objects with type members, which can be referenced through path-dependent types. Unfortunately, path-dependent types are not well-understood, and have been a roadblock in grounding the Scala type system on firm theory. We study several calculi for path-dependent types. We present DOT which captures the essence - DOT stands for Dependent Object Types. We explore the design space bottom-up, teasing apart inherent from accidental complexities, while fully mechanizing our models at each step. Even in this simple setting, many interesting patterns arise from the interaction of structural and nominal features. Whereas our simple calculus enjoys many desirable and intuitive properties, we demonstrate that the theory gets much more complicated once we add another Scala feature, type refinement, or extend the subtyping relation to a lattice. We discuss possible remedies and trade-offs in modeling type systems for Scala-like languages.

60 citations


Proceedings ArticleDOI
15 Oct 2014
TL;DR: Tardis provides affordable time-travel with an average overhead of only 7% during normal execution, a rate of 0.6MB/s of history logging, and a worst-case 0.68s time- travel latency on the authors' benchmark applications, making Tardis suitable for use as the default debugger for managed languages.
Abstract: Developers who set a breakpoint a few statements too late or who are trying to diagnose a subtle bug from a single core dump often wish for a time-traveling debugger. The ability to rewind time to see the exact sequence of statements and program values leading to an error has great intuitive appeal but, due to large time and space overheads, time traveling debuggers have seen limited adoption. A managed runtime, such as the Java JVM or a JavaScript engine, has already paid much of the cost of providing core features - type safety, memory management, and virtual IO - that can be reused to implement a low overhead time-traveling debugger. We leverage this insight to design and build affordable time-traveling debuggers for managed languages. Tardis realizes our design: it provides affordable time-travel with an average overhead of only 7% during normal execution, a rate of 0.6MB/s of history logging, and a worst-case 0.68s time-travel latency on our benchmark applications. Tardis can also debug optimized code using time-travel to reconstruct state. This capability, coupled with its low overhead, makes Tardis suitable for use as the default debugger for managed languages, promising to bring time-traveling debugging into the mainstream and transform the practice of debugging.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: The approach works by first translating the input code into a functional representation, with loops succinctly represented by fold operations, and guided by rewrite rules, the system searches a space of equivalent programs for an effective MapReduce implementation.
Abstract: We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that both preserves semantics and enables parallelism. Our approach works by first translating the input code into a functional representation, with loops succinctly represented by fold operations. Then, guided by rewrite rules, our system searches a space of equivalent programs for an effective MapReduce implementation. The rules include a novel technique for handling irregular loop-carried dependencies using group-by operations to enable greater parallelism. We have implemented our technique in a tool called Mold. It translates sequential Java code into code targeting the Apache Spark runtime. We evaluated Mold on several real-world kernels and found that in most cases Mold generated the desired MapReduce program, even for codes with complex indirect updates.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: This paper shows that HHVM discovers latent types by structuring its JIT around the concept of a tracelet, and enables HHVM to achieve high levels of performance, without sacrificing compatibility or interactivity.
Abstract: The HipHop Virtual Machine (HHVM) is a JIT compiler and runtime for PHP. While PHP values are dynamically typed, real programs often have latent types that are useful for optimization once discovered. Some types can be proven through static analysis, but limitations in the ahead-of-time approach leave some types to be discovered at run time. And even though many values have latent types, PHP programs can also contain polymorphic variables and expressions, which must be handled without catastrophic slowdown. HHVM discovers latent types by structuring its JIT around the concept of a tracelet. A tracelet is approximately a basic block specialized for a particular set of run-time types for its input values. Tracelets allow HHVM to exactly and efficiently learn the types observed by the program, while using a simple compiler. This paper shows that this approach enables HHVM to achieve high levels of performance, without sacrificing compatibility or interactivity.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: A speculative search algorithm is presented that aims to find an interleaving of the two programs with minimal abstract semantic difference, and is unique as it allows the analysis to dynamically alternate between several interleavings.
Abstract: We address the problem of computing semantic differences between a program and a patched version of the program. Our goal is to obtain a precise characterization of the difference between program versions, or establish their equivalence. We focus on infinite-state numerical programs, and use abstract interpretation to compute an over-approximation of program differences. Computing differences and establishing equivalence under abstraction requires abstracting relationships between variables in the two programs. Towards that end, we use a correlating abstract domain to compute a sound approximation of these relationships which captures semantic difference. This approximation can be computed over any interleaving of the two programs. However, the choice of interleaving can significantly affect precision. We present a speculative search algorithm that aims to find an interleaving of the two programs with minimal abstract semantic difference. This method is unique as it allows the analysis to dynamically alternate between several interleavings. We have implemented our approach and applied it to real-world examples including patches from Git, GNU Coreutils, as well as a few handpicked patches from the Linux kernel and the Mozilla Firefox web browser. Our evaluation shows that we compute precise approximations of semantic differences, and report few false differences.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: Rubah is the first dynamic software updating system for Java that is portable, implemented via libraries and bytecode rewriting on top of a standard JVM; is efficient, imposing essentially no overhead on normal, steady-state execution; and isnon-disruptive.
Abstract: This paper presents Rubah, the first dynamic software updating system for Java that: is portable, implemented via libraries and bytecode rewriting on top of a standard JVM; is efficient, imposing essentially no overhead on normal, steady-state execution; is flexible, allowing nearly arbitrary changes to classes between updates; and isnon-disruptive, employing either a novel eager algorithm that transforms the program state with multiple threads, or a novel lazy algorithm that transforms objects as they are demanded, post-update. Requiring little programmer effort, Rubah has been used to dynamically update five long-running applications: the H2 database, the Voldemort key-value store, the Jake2 implementation of the Quake 2 shooter game, the CrossFTP server, and the JavaEmailServer.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: A novel, automatic, scalable and directed approach that identifies these properties and synthesizes a deadlock revealing multithreaded test, which results in the detection of a number of deadlocks in classes documented as thread-safe.
Abstract: Designing and implementing thread-safe multithreaded libraries can be a daunting task as developers of these libraries need to ensure that their implementations are free from concurrency bugs, including deadlocks. The usual practice involves employing software testing and/or dynamic analysis to detect deadlocks. Their effectiveness is dependent on well-designed multithreaded test cases. Unsurprisingly, developing multithreaded tests is significantly harder than developing sequential tests for obvious reasons. In this paper, we address the problem of automatically synthesizing multithreaded tests that can induce deadlocks. The key insight to our approach is that a subset of the properties observed when a deadlock manifests in a concurrent execution can also be observed in a single threaded execution. We design a novel, automatic, scalable and directed approach that identifies these properties and synthesizes a deadlock revealing multithreaded test. The input to our approach is the library implementation under consideration and the output is a set of deadlock revealing multithreaded tests. We have implemented our approach as part of a tool, named OMEN1. OMEN is able to synthesize multithreaded tests on many multithreaded Java libraries. Applying a dynamic deadlock detector on the execution of the synthesized tests results in the detection of a number of deadlocks, including 35 real deadlocks in classes documented as thread-safe. Moreover, our experimental results show that dynamic analysis on multithreaded tests that are either synthesized randomly or developed by third-party programmers are ineffective in detecting the deadlocks.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: This work presents a pragmatic approach to check correctness of TypeScript declaration files with respect to JavaScript library implementations, finding 142 errors in the declaration files of 10 libraries, with an analysis time of a few minutes per library and with a low number of false positives.
Abstract: The TypeScript programming language adds optional types to JavaScript, with support for interaction with existing JavaScript libraries via interface declarations. Such declarations have been written for hundreds of libraries, but they can be difficult to write and often contain errors, which may affect the type checking and misguide code completion for the application code in IDEs. We present a pragmatic approach to check correctness of TypeScript declaration files with respect to JavaScript library implementations. The key idea in our algorithm is that many declaration errors can be detected by an analysis of the library initialization state combined with a light-weight static analysis of the library function code. Our experimental results demonstrate the effectiveness of the approach: it has found 142 errors in the declaration files of 10 libraries, with an analysis time of a few minutes per library and with a low number of false positives. Our analysis of how programmers use library interface declarations furthermore reveals some practical limitations of the TypeScript type system.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: This paper presents EventBreak, a performance-guided test generation technique to identify and analyze event handlers whose execution time may gradually increase while using the application, and shows that EventBreak helps in testing that event handlers avoid such problems by bounding a handler's execution time.
Abstract: Event-driven user interface applications typically have a single thread of execution that processes event handlers in response to input events triggered by the user, the network, or other applications. Programmers must ensure that event handlers terminate after a short amount of time because otherwise, the application may become unresponsive. This paper presents EventBreak, a performance-guided test generation technique to identify and analyze event handlers whose execution time may gradually increase while using the application. The key idea is to systematically search for pairs of events where triggering one event increases the execution time of the other event. For example, this situation may happen because one event accumulates data that is processed by the other event. We implement the approach for JavaScript-based web applications and apply it to three real-world applications. EventBreak discovers events with an execution time that gradually increases in an unbounded way, which makes the application unresponsive, and events that, if triggered repeatedly, reveal a severe scalability problem, which makes the application unusable. The approach reveals two known bugs and four previously unknown responsiveness problems. Furthermore, we show that EventBreak helps in testing that event handlers avoid such problems by bounding a handler's execution time.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: The first data debugging tool, CheckCell, an add-in for Microsoft Excel, is presented, which shows that CheckCell is both analytically and empirically fast and effective, and automatically identifies a key flaw in the infamous Reinhart and Rogoff spreadsheet.
Abstract: Testing and static analysis can help root out bugs in programs, but not in data. This paper introduces data debugging, an approach that combines program analysis and statistical analysis to automatically find potential data errors. Since it is impossible to know a priori whether data are erroneous, data debugging instead locates data that has a disproportionate impact on the computation. Such data is either very important, or wrong. Data debugging is especially useful in the context of data-intensive programming environments that intertwine data with programs in the form of queries or formulas. We present the first data debugging tool, CheckCell, an add-in for Microsoft Excel. CheckCell identifies cells that have an unusually high impact on the spreadsheet's computations. We show that CheckCell is both analytically and empirically fast and effective. We show that it successfully finds injected typographical errors produced by a generative model trained with data entry from 169,112 Mechanical Turk tasks. CheckCell is more precise and efficient than standard outlier detection techniques. CheckCell also automatically identifies a key flaw in the infamous Reinhart and Rogoff spreadsheet.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: A new algorithm SplitMix is described, derived from the DotMix algorithm of Leiserson, Schardl, and Sukha, that is faster and produces pseudorandom sequences of higher quality and far superior in quality and speed to java.util.SplittableRandom.
Abstract: We describe a new algorithm SplitMix for an object-oriented and splittable pseudorandom number generator (PRNG) that is quite fast: 9 64-bit arithmetic/logical operations per 64 bits generated. A conventional linear PRNG object provides a generate method that returns one pseudorandom value and updates the state of the PRNG, but a splittable PRNG object also has a second operation, split, that replaces the original PRNG object with two (seemingly) independent PRNG objects, by creating and returning a new such object and updating the state of the original object. Splittable PRNG objects make it easy to organize the use of pseudorandom numbers in multithreaded programs structured using fork-join parallelism. No locking or synchronization is required (other than the usual memory fence immediately after object creation). Because the generate method has no loops or conditionals, it is suitable for SIMD or GPU implementation. We derive SplitMix from the DotMix algorithm of Leiserson, Schardl, and Sukha by making a series of program transformations and engineering improvements. The end result is an object-oriented version of the purely functional API used in the Haskell library for over a decade, but SplitMix is faster and produces pseudorandom sequences of higher quality; it is also far superior in quality and speed to java.util.Random, and has been included in Java JDK8 as the class java.util.SplittableRandom. We have tested the pseudorandom sequences produced by SplitMix using two standard statistical test suites (DieHarder and TestU01) and they appear to be adequate for "everyday" use, such as in Monte Carlo algorithms and randomized data structures where speed is important.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: Two static analyses are designed and tested - symbolic region and range analysis - which are combined to remove the majority of guards that prevent out-of-bounds memory accesses and generate code that is 17% faster and 9% more energy efficient than the code produced originally by this tool.
Abstract: The C programming language does not prevent out-of-bounds memory accesses There exist several techniques to secure C programs; however, these methods tend to slow down these programs substantially, because they populate the binary code with runtime checks To deal with this problem, we have designed and tested two static analyses - symbolic region and range analysis - which we combine to remove the majority of these guards In addition to the analyses themselves, we bring two other contributions First, we describe live range splitting strategies that improve the efficiency and the precision of our analyses Secondly, we show how to deal with integer overflows, a phenomenon that can compromise the correctness of static algorithms that validate memory accesses We validate our claims by incorporating our findings into AddressSanitizer We generate SPEC CINT 2006 code that is 17% faster and 9% more energy efficient than the code produced originally by this tool Furthermore, our approach is 50% more effective than Pentagons, a state-of-the-art analysis to sanitize memory accesses

Proceedings ArticleDOI
15 Oct 2014
TL;DR: An experimental survey of the present state of energy management across the stack, measuring the total energy, average power, and execution time of 41 benchmark applications in 220 configurations, across a total of 200,000 program executions finds that effective parallelization and compiler optimizations have the potential to save far more energy.
Abstract: Modern demand for energy-efficient computation has spurred research at all levels of the stack, from devices to microarchitecture, operating systems, compilers, and languages. Unfortunately, this breadth has resulted in a disjointed space, with technologies at different levels of the system stack rarely compared, let alone coordinated. This work begins to remedy the problem, conducting an experimental survey of the present state of energy management across the stack. Focusing on settings that are exposed to software, we measure the total energy, average power, and execution time of 41 benchmark applications in 220 configurations, across a total of 200,000 program executions. Some of the more important findings of the survey include that effective parallelization and compiler optimizations have the potential to save far more energy than Linux's frequency tuning algorithms; that certain non-complementary energy strategies can undercut each other's savings by half when combined; and that while the power impacts of most strategies remain constant across applications, the runtime impacts vary, resulting in inconsistent energy impacts.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: An n-gram model for programming languages is proposed and a plagiarism detector with accuracy that beats previous techniques is implemented, and a bug-finding tool is demonstrated that discovered over a dozen previously unknown bugs in a collection of real deployed programs.
Abstract: Several program analysis tools - such as plagiarism detection and bug finding - rely on knowing a piece of code's relative semantic importance. For example, a plagiarism detector should not bother reporting two programs that have an identical simple loop counter test, but should report programs that share more distinctive code. Traditional program analysis techniques (e.g., finding data and control dependencies) are useful, but do not say how surprising or common a line of code is. Natural language processing researchers have encountered a similar problem and addressed it using an n-gram model of text frequency, derived from statistics computed over text corpora. We propose and compute an n-gram model for programming languages, computed over a corpus of 2.8 million JavaScript programs we downloaded from the Web. In contrast to previous techniques, we describe a code n-gram as a subgraph of the program dependence graph that contains all nodes and edges reachable in n steps from the statement. We can count n-grams in a program and count the frequency of n-grams in the corpus, enabling us to compute tf-idf-style measures that capture the differing importance of different lines of code. We demonstrate the power of this approach by implementing a plagiarism detector with accuracy that beats previous techniques, and a bug-finding tool that discovered over a dozen previously unknown bugs in a collection of real deployed programs.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: Free Lunch is proposed, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock.
Abstract: Today, Java is regularly used to implement large multi-threaded server-class applications that use locks to protect access to shared data. However, understanding the impact of locks on the performance of a system is complex, and thus the use of locks can impede the progress of threads on configurations that were not anticipated by the developer, during specific phases of the execution. In this paper, we propose Free Lunch, a new lock profiler for Java application servers, specifically designed to identify, in-vivo, phases where the progress of the threads is impeded by a lock. Free Lunch is designed around a new metric, critical section pressure (CSP), which directly correlates the progress of the threads to each of the locks. Using Free Lunch, we have identified phases of high CSP, which were hidden with other lock profilers, in the distributed Cassandra NoSQL database and in several applications from the DaCapo 9.12, the SPECjvm2008 and the SPECjbb2005 benchmark suites. Our evaluation of Free Lunch shows that its overhead is never greater than 6%, making it suitable for in-vivo use.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: A novel reduction scheme for asynchronous event-driven programs that finds almost-synchronous invariants - invariants consisting of global states where message buffers are close to empty and simultaneously argues that they cover all local states.
Abstract: We consider the problem of provably verifying that an asynchronous message-passing system satisfies its local assertions. We present a novel reduction scheme for asynchronous event-driven programs that finds almost-synchronous invariants - invariants consisting of global states where message buffers are close to empty. The reduction finds almost-synchronous invariants and simultaneously argues that they cover all local states. We show that asynchronous programs often have almost-synchronous invariants and that we can exploit this to build natural proofs that they are correct. We implement our reduction strategy, which is sound and complete, and show that it is more effective in proving programs correct as well as more efficient in finding bugs in several programs, compared to current search strategies which almost always diverge. The high point of our experiments is that our technique can prove the Windows Phone USB Driver written in P [9]correct for the responsiveness property, which was hitherto not provable using state-of-the-art model-checkers.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: The approach works by reducing the search for minimum error sources to an optimization problem that is formulated in terms of weighted maximum satisfiability modulo theories (MaxSMT), which allows it to build on SMT solvers to support rich type systems and at the same time abstract from the concrete criterion that is used for ranking the error sources.
Abstract: Automatic type inference is a popular feature of functional programming languages. If a program cannot be typed, the compiler typically reports a single program location in its error message. This location is the point where the type inference failed, but not necessarily the actual source of the error. Other potential error sources are not even considered. Hence, the compiler often misses the true error source, which increases debugging time for the programmer. In this paper, we present a general framework for automatic localization of type errors. Our algorithm finds all minimum error sources, where the exact definition of minimum is given in terms of a compiler-specific ranking criterion. Compilers can use minimum error sources to produce more meaningful error reports, and for automatic error correction. Our approach works by reducing the search for minimum error sources to an optimization problem that we formulate in terms of weighted maximum satisfiability modulo theories (MaxSMT). The reduction to weighted MaxSMT allows us to build on SMT solvers to support rich type systems and at the same time abstract from the concrete criterion that is used for ranking the error sources. We have implemented an instance of our framework targeted at Hindley-Milner type systems and evaluated it on existing OCaml benchmarks for type error localization. Our evaluation shows that our approach has the potential to significantly improve the quality of type error reports produced by state of the art compilers.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: Distributed REScala is proposed, a reactive language with a change propagation algorithm that works without centralized knowledge about the topology of the dependency structure among reactive values and avoids unnecessary propagation of changes, while retaining safety guarantees (glitch freedom).
Abstract: Reactive programming improves the design of reactive applications by relocating the logic for managing dependencies between dependent values away from the application logic to the language implementation. Many distributed applications are reactive. Yet, existing change propagation algorithms are not suitable in a distributed setting. We propose Distributed REScala, a reactive language with a change propagation algorithm that works without centralized knowledge about the topology of the dependency structure among reactive values and avoids unnecessary propagation of changes, while retaining safety guarantees (glitch freedom). Distributed REScala enables distributed reactive programming, bringing the benefits of reactive programming to distributed applications. We demonstrate the enabled design improvements by a case study. We also empirically evaluate the performance of our algorithm in comparison to other algorithms in a simulated distributed setting.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: Coq verification of a compiler for a high-level language is described, such that the compiler correctness theorem allows us to derive partial-correctness Hoare-logic theorems for programs built by linking the assembly code output by the authors' compiler and assembly code produced by other means.
Abstract: Many real programs are written in multiple different programming languages, and supporting this pattern creates challenges for formal compiler verification. We describe our Coq verification of a compiler for a high-level language, such that the compiler correctness theorem allows us to derive partial-correctness Hoare-logic theorems for programs built by linking the assembly code output by our compiler and assembly code produced by other means. Our compiler supports such tricky features as storable cross-language function pointers, without giving up the usual benefits of being able to verify different compiler phases (including, in our case, two classic optimizations) independently. The key technical innovation is a mixed operational and axiomatic semantics for the source language, with a built-in notion of abstract data types, such that compiled code interfaces with other languages only through axiomatically specified methods that mutate encapsulated private data, represented in whatever formats are most natural for those languages.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: This work proposes the first dynamic approach for automated migration of build scripts to new build systems and implemented the proposed two phase migration approach in a tool called METAMORPHOSIS that has been recently used at Microsoft.
Abstract: The efficiency of a build system is an important factor for developer productivity. As a result, developer teams have been increasingly adopting new build systems that allow higher build parallelization. However, migrating the existing legacy build scripts to new build systems is a tedious and error-prone process. Unfortunately, there is insufficient support for automated migration of build scripts, making the migration more problematic. We propose the first dynamic approach for automated migration of build scripts to new build systems. Our approach works in two phases. First, from a set of execution traces, we synthesize build scripts that accurately capture the intent of the original build. The synthesized build scripts are typically long and hard to maintain. Second, we apply refactorings that raise the abstraction level of the synthesized scripts (e.g., introduce functions for similar fragments). As different refactoring sequences may lead to different build scripts, we use a search-based approach that explores various sequences to identify the best (e.g., shortest) build script. We optimize search-based refactoring with partial-order reduction to faster explore refactoring sequences. We implemented the proposed two phase migration approach in a tool called METAMORPHOSIS that has been recently used at Microsoft.

Proceedings ArticleDOI
15 Oct 2014
TL;DR: It is argued that this can enable more efficient symbolic exploration of deep code paths in multithreaded programs by allowing the symbolic engine to jump directly to program contexts of interest by allowing it to avoid some common causes of infeasible-path explosion.
Abstract: We describe an algorithm to perform symbolic execution of a multithreaded program starting from an arbitrary program context. We argue that this can enable more efficient symbolic exploration of deep code paths in multithreaded programs by allowing the symbolic engine to jump directly to program contexts of interest. The key challenge is modeling the initial context with reasonable precision - an overly approximate model leads to exploration of many infeasible paths during symbolic execution, while a very precise model would be so expensive to compute that computing it would defeat the purpose of jumping directly to the initial context in the first place. We propose a context-specific dataflow analysis that approximates the initial context cheaply, but precisely enough to avoid some common causes of infeasible-path explosion. This model is necessarily approximate - it may leave portions of the memory state unconstrained, leaving our symbolic execution unable to answer simple questions such as "which thread holds lock A?". For such cases, we describe a novel algorithm for evaluating symbolic synchronization during symbolic execution. Our symbolic execution semantics are sound and complete up to the limits of the underlying SMT solver. We describe initial experiments on an implementation in Cloud 9.