scispace - formally typeset
Search or ask a question

Showing papers in "ACM Letters on Programming Languages and Systems in 1992"


Journal ArticleDOI
TL;DR: It is shown that two variations of each type of race exist: feasible general races and data races, and that locating feasible races is an NP-hard problem, implying that only the apparent races can be detected in practice.
Abstract: In shared-memory parallel programs that use explicit synchronization, race conditions result when accesses to shared memory are not properly synchronized. Race conditions are often considered to be manifestations of bugs, since their presence can cause the program to behave unexpectedly. Unfortunately, there has been little agreement in the literature as to precisely what constitutes a race condition. Two different notions have been implicitly considered: one pertaining to programs intended to be deterministic (which we call general races) and the other to nondeterministic programs containing critical sections (which we call data races). However, the differences between general races and data races have not yet been recognized. This paper examines these differences by characterizing races using a formal model and exploring their properties. We show that two variations of each type of race exist: feasible general races and data races capture the intuitive notions desired for debugging and apparent races capture less accurate notions implicitly assumed by most dynamic race detection methods. We also show that locating feasible races is an NP-hard problem, implying that only the apparent races, which are approximations to feasible races, can be detected in practice. The complexity of dynamically locating apparent races depends on the type of synchronization used by the program. Apparent races can be exhaustively located efficiently only for weak types of synchronization that are incapable of implementing mutual exclusion. This result has important implications since we argue that debugging general races requires exhaustive race detection and is inherently harder than debugging data races (which requires only partial race detection). Programs containing data races can therefore be efficiently debugged by locating certain easily identifiable races. In contrast, programs containing general races require more complex debugging techniques.

471 citations


Journal ArticleDOI
TL;DR: This paper focuses on static analysis of programs for languages with if statements, loops, dynamic storage, and recursive data structures, and examines the problems faced by such analysis under the common simplifying assumptions.
Abstract: Static analysis of programs is indispensable to any software tool, environment, or system that requires compile-time information about the semantics of programs. With the emergence of languages like C and LISP, static analysis of programs with dynamic storage and recursive data structures has become a field of active research. Such analysis is difficult, and the static-analysis community has recognized the need for simplifying assumptions and approximate solutions. However, even under the common simplifying assumptions, such analyses are harder than previously recognized. Two fundamental static-analysis problems are may alias and must alias. The former is not recursive (is undecidable), and the latter is not recursively enumerable (is uncomputable), even when all paths are executable in the program being analyzed for languages with if statements, loops, dynamic storage, and recursive data structures.

430 citations


Journal ArticleDOI
TL;DR: This paper describes a simple program that generates matchers that are fast, compact, and easy to understand and run up to 25 times faster than Twig's matchers.
Abstract: Many code-generator generators use tree pattern matching and dynamic programming This paper describes a simple program that generates matchers that are fast, compact, and easy to understand It is simpler than common alternatives: 200–700 lines of Icon or 950 lines of C versus 3000 lines of C for Twig and 5000 for burg Its matchers run up to 25 times faster than Twig's They are necessarily slower than burg's BURS (bottom-up rewrite system) matchers, but they are more flexible and still practical

222 citations


Journal ArticleDOI
TL;DR: This work shows how to compile Standard ML to C without compromising on portability and proper tail recursion, and analyzes the performance and determines the aspects of the compilation method that lead to the observed slowdown.
Abstract: C has been used as a portable target language for implementing languages like Standard ML and Scheme. Previous efforts at compiling these languages to C have produced efficient code, but have compromised on portability and proper tail recursion. We show how to compile Standard ML to C wihout making such compromises. The compilation technique is based on converting Standard ML to a continuation-passing style l-calculus intermediate language and then compiling this language to C. The code generated by this compiler achieves an execution speed that is about a factor of two slower than that generated by a native code compiler. The compiler generates highly portable code, yet still supports advanced features like garbage collection and first-class continuations. We analyze the performance and determine the aspects of the compilation method that lead to the observed slowdown. We also suggest changes to C compilers that would better support such compilation methods.

113 citations


Journal ArticleDOI
TL;DR: This paper presents a method for semi-automatic bug localization, generalized algorithmic debugging, which has been integrated with the category partition method for functional testing, and believes that this is the first generalization of algorithmic Debugging for programs with side-effects written in imperative languages such as Pascal.
Abstract: This paper presents a method for semi-automatic bug localization, generalized algorithmic debugging, which has been integrated with the category partition method for functional testing. In this way the efficiency of the algorithmic debugging method for bug localization can be improved by using test specifications and test results. The long-range goal of this work is a semi-automatic debugging and testing system which can be used during large-scale program development of nontrivial programs.The method is generally applicable to procedural langua ges and is not dependent on any ad hoc assumptions regarding the subject program. The original form of algorithmic debugging, introduced by Shapiro, was however limited to small Prolog programs without side-effects, but has later been generalized to concurrent logic programming languages. Another drawback of the original method is the large number of interactions with the user during bug localization.To our knowledge, this is the first method which uses category partition testing to improve the bug localization properties of algorithmic debugging. The method can avoid irrelevant questions to the programmer by categorizing input parameters and then match these against test cases in the test database. Additionally, we use program slicing, a data flow analysis technique, to dynamically compute which parts of the program are relevant for the search, thus further improving bug localization.We believe that this is the first generalization of algorithmic debugging for programs with side-effects written in imperative languages such as Pascal. These improvements together makes it more feasible to debug larger programs. However, additional improvements are needed to make it handle pointer-related side-effects and concurrent Pascal programs.A prototype generalized algorithmic debugger for a Pascal subset without pointer side-effects and a test case generator for application programs in Pascal, C, dBase, and LOTUS have been implemented.

102 citations


Journal ArticleDOI
TL;DR: This article uses a long-time code modification system to analyze large, linked, program modules of C++, C, and Fortran, and finds that C++ programs using object-oriented programming style contain a large fraction of unreachable procedure code.
Abstract: Unreachable procedures are procedures that can never be invoked. Their existence may adversely affect the performance of a program. Unfortunately, their detection requires the entire program to be present. Using a long-time code modification system, we analyze large, linked, program modules of C++, C, and Fortran. We find that C++ programs using object-oriented programming style contain a large fraction of unreachable procedure code. In contrast, C and Fortran programs have a low and essentially constant fraction of unreachable code. In this article, we present our analysis of C++, C, and Fortran programs, and we discuss how object-oriented programming style generates unreachable procedures.

86 citations


Journal ArticleDOI
TL;DR: The algorithm computes the possible bindings of procedure variables in languages where such variables only receive their values through parameter passing, such as Fortran, to accommodate a limited form of assignments to procedure variables.
Abstract: We present an efficient algorithm for computing the procedure call graph, the program representation underlying most interprocedural optimization techniques. The algorithm computes the possible bindings of procedure variables in languages where such variables only receive their values through parameter passing, such as Fortran. We extend the algorithm to accommodate a limited form of assignments to procedure variables. The resulting algorithm can also be used in analysis of functional programs that have been converted to Continuation-Passing-Style.We discuss the algorithm in relationship to other call graph analysis approaches. Many less efficient techniques produce essentially the same call graph. A few algorithms are more precise, but they may be prohibitively expensive depending on language features.

84 citations


Journal ArticleDOI
TL;DR: This paper discusses the representation of register pairs in a graph coloring allocator, and explains the problems that arise with Chaitin's allocator and shows how the optimistic allocator avoids them.
Abstract: Many architectures require that a program use pairs of adjacent registers to hold double-precision floating-point values. Register allocators based on Chaitin's graph-coloring technique have trouble with programs that contain both single-register values and values that require adjacent pairs of registers. In particular, Chaitin's algorithm often produces excessive spilling on such programs. This results in underuse of the register set; the extra loads and stores inserted into the program for spilling also slow execution.An allocator based on an optimistic coloring scheme naturally avoids this problem. Such allocators delay the decision to spill a value until late in the allocation process. This eliminates the over-spilling provoked by adjacent register pairs in Chaitin's scheme.This paper discusses the representation of register pairs in a graph coloring allocator. It explains the problems that arise with Chaitin's allocator and shows how the optimistic allocator avoids them. It provides a rationale for determining how to add larger aggregates to the interference graph.

67 citations


Journal ArticleDOI
TL;DR: The structure of a program can encode implicit information that changes both the shape and speed of the generated code, and the ability of several analytical techniques to help the compiler avoid similar problems is examined.
Abstract: The structure of a program can encode implicit information that changes both the shape and speed of the generated code. Interprocedural transformations like inlining often discard such information; using interprocedural data-flow information as a basis for optimization can have the same effect.In the course of a study on inline substitution with commercial FORTRAN compilers, we encountered unexpected performance problems in one of the programs. This paper describes the specific problem that we encountered, explores its origins, and examines the ability of several analytical techniques to help the compiler avoid similar problems.

46 citations


Journal ArticleDOI
TL;DR: A hybrid CPS-transformation is presented, for a language with annotations resulting from strictness analysis, and %8 is derived by symbolically composing two transformations Y and &v.
Abstract: Strictness analysis is a common component of compilers for call-by-name functional languages; the continuation-passing-style (CPS-) transformation is a common component of compilers for call-by-value functional languages. To bridge these two implementation techniques, we present a hybrid CPS-transformation &, for a language with annotations resulting from strictness analysis. %8 is derived by symbolically composing two transformations Y and &v; that is

38 citations


Journal ArticleDOI
TL;DR: Two transformations called inner-loop guard elimination and conservative expression substitution are introduced to enhance propagation of range checks in nested while-loops and to define a partial order on related range checks.
Abstract: Compile-time elimination of subscript range checks is performed by some optimizing compilers to reduce the overhead associated with manipulating array data structures. Elimination and propagation, the two methods of subscript range check optimization, are less effective for eliminating global redundancies especially in while-loop structures with nonconstant loop guards. This paper describes a subscript range check optimization procedure that can eliminate more range checks than current methods. Two transformations called inner-loop guard elimination and conservative expression substitution are introduced to enhance propagation of range checks in nested while-loops and to define a partial order on related range checks. Global elimination is improved by considering range checks performed before control reaches a statement and after control leaves a statement. A unique feature of this method is the simplification of the available range-check analysis system for global elimination.

Journal ArticleDOI
TL;DR: The design of an OSM (Object Space Manager) that allows partitioning of real memory on object, rather than page, boundaries is described, in which the worst-case stop-and-wait garbage collection delay ranges between 10 and 500 μsec, depending on the system configuration.
Abstract: Modern object-oriented languages and programming paradigms require finer-grain division of memory than is provided by traditional paging and segmentation systems. This paper describes the design of an OSM (Object Space Manager) that allows partitioning of real memory on object, rather than page, boundaries. The time required by the OSM to create an object, or to find the beginning of an object given a pointer to any location within it, is approximately one memory cycle. Object sizes are limited only by the availability of address bits. In typical configurations of object-oriented memory modules, one OSM chip is required for every 16 RAM chips. The OSM serves a central role in the implementation of a hardware-assisted garbage collection system in which the worst-case stop-and-wait garbage collection delay ranges between 10 and 500 msec, depending on the system configuration.

Journal ArticleDOI
TL;DR: The transitive closure of the control dependence relation is characterized and an application to the theory of control fow guards is given and related to characterizations by Beck, Sarkar, and Cytron.
Abstract: We characterize the transitive closure of the control dependence relation and give an application to the theory of control fow guards. We relate our result to characterizations by Beck et al., by Sarkar, and by Cytron et al., and strengthen a result of the latter concerning dominance frontiers and join sets.

Journal ArticleDOI
TL;DR: In this article, the authors present a new approach to static program analysis that permits each expression in a program to be assigned an execution time estimate, which is either an integer upper bound on the number of ticks the expression will execute, or the distinguished element long that indicates that the expression contains a loop, and thus may run for an arbitrary length of time.
Abstract: We present a new approach to static program analysis that permits each expression in a program to be assigned an execution time estimate. Our approach uses a time system in conjunction with a conventional type system to compute both the type and the time of an expression. The time of an expression is either an integer upper bound on the number of ticks the expression will execute, or the distinguished element long that indicates that the expression contains a loop, and thus may run for an arbitrary length of time. Every function type includes a latent time that is used to communicate its expected execution time from the point of its definition to the points of its use. Unlike previous approaches, a time system works in the presence of first-class functions and separate compilation. In addition, time polymorphism allows the time of a function to depend on the times of any functions that it takes as arguments. Time estimates are useful when compiling programs for multiprocessors in order to balance the overhead of initiating a concurrent computation against the expected execution time of the computation. The correctness of our time system is proven with respect to a dynamic semantics.

Journal ArticleDOI
TL;DR: A new approach is presented that leads to the improved analysis and transformation of programs with recursively defined pointer data structures based on a mechanism for the Abstract Description of Data Structures (ADDS).
Abstract: Even though impressive progress has been made in the area of optimizing and parallelizing array-based programs, the application of similar techniques to programs using pointer data structures has remained difficult. Unlike arrays which have a small number of well-defined properties, pointers can be used to implement a wide variety of structures which exhibit a much larger set of properties. The diversity of these structures implies that programs with pointer data structures cannot be effectively analyzed by traditional optimizing and parallelizing compilers.In this paper we present a new approach that leads to the improved analysis and transformation of programs with recursively defined pointer data structures. Our approach is based on a mechanism for the Abstract Description of Data Structures (ADDS). ADDS is a simple extension to existing imperative languages that allows the programmer to explicitly describe the important properties of a large class of data structures. These abstract descriptions may be used by the compiler to achieve more accurate program analysis in the presence of pointers, which in turn enables and improves the application of numerous optimizing and parallelizing transformations. We present ADDS by describing various data structures; we discuss how such descriptions can be used to improve analysis and debugging; and we supply three important transformations enabled by ADDS.

Journal ArticleDOI
TL;DR: This paper shows how to make the latency of scanning a page in the Appel-Ellis-Li real-time garbage collector be proportional only to the number of object references on a page (the page size), instead of to the sum of the sizes of the objects referenced by the page.
Abstract: This paper shows how to make the latency of scanning a page in the Appel-Ellis-Li real-time garbage collector be proportional only to the number of object references on a page (the page size), instead of to the sum of the sizes of the objects referenced by the page. This makes the garbage collection algorithm much more suitable for real-time systems.

Journal ArticleDOI
TL;DR: The goal of the work presented in this paper is not to provide a more efficient computation of shortest paths but to investigate how the intermediate tables, known as extension tables, generated by the complete evaluation strategy might be used in approximation algorithms.
Abstract: An approximation paradigm is proposed for logic programming as a simple modification to a complete evaluation strategy. The motivational example illustrates how a straigthforward transformation of a declarative specification of the distance between two vertices in a directed graph leads to sophisticated algorithms for computing shortest paths. The goal of the work presented in this paper is not to provide a more efficient computation of shortest paths but to investigate how the intermediate tables, known as extension tables, generated by the complete evaluation strategy might be used in approximation algorithms. We present the ETdistance algorithm in perspective, its execution is compared to those of Dijkstra's single-source and Floyd's all-pairs shortest path algorithms.

Journal ArticleDOI
TL;DR: It is shown that static single assignment form does not remove all antidependences, and that it conflicts with table-driven code generation for 2-address machines, and how to solve them is described.
Abstract: Static single assignment form represents data dependences elegantly and provides a basis for powerful optimizations. Table-driven techniques for peephole optimization and code generation are straightforward and effective. it is natural to want to use both together in a code optimizer. However, doing so reveals that static single assignment form does not remove all antidependences, and that it conflicts with table-driven code generation for 2-address machines. This paper describes these problems and how to solve them.

Journal ArticleDOI
TL;DR: The paper describes horizontal partitioning, code generation in MPL and efficiency of programs generated for Maspar SIMD machine.
Abstract: Massively parallel SIMD machines rely on data parallelism usually achieved by a careful hand coding to support program efficiency. This paper describes parallelization of code generated for SIMD machines by the compiler for the Equational Programming Language, EPL. The language supports architecture-independent scientific programming by recurrent equations. The EPL compiler serves as a programming aid for users of parallel machines by automating data partitioning and computation parallelization based on inherent data dependencies. In support of a Connection Machine architecture, the EPL compiler performs horizontal partitioning of the program, a process that selects a dimension of each data structure to be projected along the processor array. Each processor then holds a single instance of that structure and operations along the projected dimension are done in parallel. The paper describes horizontal partitioning, code generation in MPL and efficiency of programs generated for Maspar SIMD machine.

Journal ArticleDOI
TL;DR: Several potential definitions of dependence distance are identified, all of which give the same answer for normalized loops: normalized loops have constant lower bounds and a step of 1.
Abstract: Data dependence distance is widely used to characterize data dependences in advance optimizing compilers The standard definition of dependence distance assumes that loops are normalized (have constant lower bounds and a step of 1); there is not a commonly accepted definition for unnormalized loops We have identified several potential definitions, all of which give the same answer for normalized loops There are a number of subtleties involved in choosing between these definitions, and no one definition is suitable for all applications

Journal ArticleDOI
Michael G. Burke1, Jong-Deok Choi1
TL;DR: This document describes a mechanism which, in factoring in interprocedural aliases, computes data-flow information more precisely and with less time and space overhead than previous approaches.
Abstract: Data-flow analysis is a basis for program optimization and parallelizing transformations. The mechanism of passing reference parameters at call sites generates interprocedural aliases which complicate this analysis. Solutions have been developed for efficiently computing interprocedural aliases. However, factoring the computed aliases into data-flow information has been mostly overlooked, although improper factoring results in imprecise (conservative) data-flow information. In this document, we describe a mechanism which, in factoring in interprocedural aliases, computes data-flow information more precisely and with less time and space overhead than previous approaches.

Journal ArticleDOI
B. Ramkumar1
TL;DR: An important optimization for portable parallel logic programming is presented, namely distributed last-call optimization, an analog of the tail recursion optimization for sequential Prolog.
Abstract: A difficult but challenging problem is the efficient exploitation of AND and OR parallelism in logic programs without making any assumptions about the underlying target machine(s). In earlier papers, we described the design of a binding environment for AND and OR parallel execution of logic programs on shared and nonshared memory machines and the performance of a compiler (called ROLOG) using this binding environment on a range of MIMD parallel machines.In this paper, we present an important optimization for portable parallel logic programming, namely distributed last-call optimization, an analog of the tail recursion optimization for sequential Prolog. This scheme has been implemented in the ROLOG compiler, which ports unchanged on several shared memory and nonshared memory machines. We describe the effect of this optimization on several OR, AND/OR and AND parallel benchmark programs.

Journal ArticleDOI
David E. Goldberg1
TL;DR: The issues involved in designing the floating-point part of a programming language are discussed and it is shown that there are more significant semantic issues involved.
Abstract: The issues involved in designing the floating-point part of a programming language are discussed. Looking at the language specifications for most existing languages might suggest that this design involves only trivial issues, such as whether to have one or two types of REALs or how to name the functions that convert from INTEGER to REAL. It is shown that there are more significant semantic issues involved. After discussing the trade-offs for the major design decisions, they are illustrated by presenting the design of the floating-point part of the Modula-3 language.

Journal ArticleDOI
TL;DR: This paper presents a space-efficient algorithm computing the lifetimes of intermediate values that is used by an optimizing compiler for the Icon programming language and is applicable to other programming languages that employ goal-directed evaluation.
Abstract: In programming languages that support goal-directed evaluation to make use of alternative results, an expression can produce a value, suspend, and later be resumed to produce another value. This causes control backtracking to earlier points in a computation and complicates the maintenance of intermediate values. This paper presents a space-efficient algorithm computing the lifetimes of intermediate values that is used by an optimizing compiler for the Icon programming language. The algorithm is applicable to other programming languages that employ goal-directed evaluation.

Journal ArticleDOI
TL;DR: A construction for atomic registers is presented; this construction has the surprising property that it is correct with respect to a specification based on partial orders but is incorrect withrespect to a naively derived specificationbased on global time.
Abstract: Concurrency in distributed systems is usually modeled by a nondeterministic interleaving of atomic events. The consequences of this interleaving (or global time) assumption on the specifications and proofs of distributed programs are examined in this paper. A construction for atomic registers is presented; this construction has the surprising property that it is correct with respect to a specification based on partial orders but is incorrect with respect to a naively derived specification based on global time.