scispace - formally typeset
Search or ask a question

An Executable Formal Semantics of C with Applications: Technical Report

TL;DR: In this paper, the authors present an executable formal semantics of C. The semantics yields an interpreter, debugger, state space search tool, and model checker, which is shown capable of automatically finding program errors, both statically and at runtime.
Abstract: This paper describes an executable formal semantics of C. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes 770 of 776 test programs. It is the most complete and thoroughly tested formal definition of C to date. The semantics yields an interpreter, debugger, state space search tool, and model checker “for free”. The semantics is shown capable of automatically finding program errors, both statically and at runtime. It is also used to enumerate nondeterministic behavior.
Citations
More filters
Proceedings ArticleDOI
09 Jun 2014
TL;DR: This work introduces equivalence modulo inputs (EMI), a simple, widely applicable methodology for validating optimizing compilers, and profiles a program's test executions and stochastically prune its unexecuted code to create a practical implementation.
Abstract: We introduce equivalence modulo inputs (EMI), a simple, widely applicable methodology for validating optimizing compilers. Our key insight is to exploit the close interplay between (1) dynamically executing a program on some test inputs and (2) statically compiling the program to work on all possible inputs. Indeed, the test inputs induce a natural collection of the original program's EMI variants, which can help differentially test any compiler and specifically target the difficult-to-find miscompilations. To create a practical implementation of EMI for validating C compilers, we profile a program's test executions and stochastically prune its unexecuted code. Our extensive testing in eleven months has led to 147 confirmed, unique bug reports for GCC and LLVM alone. The majority of those bugs are miscompilations, and more than 100 have already been fixed. Beyond testing compilers, EMI can be adapted to validate program transformation and analysis systems in general. This work opens up this exciting, new direction.

363 citations

Proceedings ArticleDOI
09 Jul 2018
TL;DR: KEVM is presented, an executable formal specification of the EVM's bytecode stack-based language built with the K Framework, designed to serve as a solid foundation for further formal analyses and to demonstrate the usability of the semantics.
Abstract: A developing field of interest for the distributed systems and applied cryptography communities is that of smart contracts: self-executing financial instruments that synchronize their state, often through a blockchain. One such smart contract system that has seen widespread practical adoption is Ethereum, which has grown to a market capacity of 100 billion USD and clears an excess of 500,000 daily transactions. Unfortunately, the rise of these technologies has been marred by a series of costly bugs and exploits. Increasingly, the Ethereum community has turned to formal methods and rigorous program analysis tools. This trend holds great promise due to the relative simplicity of smart contracts and bounded-time deterministic execution inherent to the Ethereum Virtual Machine (EVM). Here we present KEVM, an executable formal specification of the EVM's bytecode stack-based language built with the K Framework, designed to serve as a solid foundation for further formal analyses. We empirically evaluate the correctness and performance of KEVM using the official Ethereum test suite. To demonstrate the usability, several extensions of the semantics are presented. and two different-language implementations of the ERC20 Standard Token are verified against the ERC20 specification. These results are encouraging for the executable semantics approach to language prototyping and specification.

299 citations

Proceedings ArticleDOI
11 Jun 2012
TL;DR: It is concluded that effective program reduction requires more than straightforward delta debugging, so three new, domain-specific test-case reducers are designed and implemented based on a novel framework in which a generic fixpoint computation invokes modular transformations that perform reduction operations.
Abstract: To report a compiler bug, one must often find a small test case that triggers the bug. The existing approach to automated test-case reduction, delta debugging, works by removing substrings of the original input; the result is a concatenation of substrings that delta cannot remove. We have found this approach less than ideal for reducing C programs because it typically yields test cases that are too large or even invalid (relying on undefined behavior). To obtain small and valid test cases consistently, we designed and implemented three new, domain-specific test-case reducers. The best of these is based on a novel framework in which a generic fixpoint computation invokes modular transformations that perform reduction operations. This reducer produces outputs that are, on average, more than 25 times smaller than those produced by our other reducers or by the existing reducer that is most commonly used by compiler developers. We conclude that effective program reduction requires more than straightforward delta debugging.

240 citations

Proceedings ArticleDOI
14 Jan 2015
TL;DR: K-Java is presented, a complete executable formal semantics of Java 1.4 that is applied to model-check multi-threaded programs and is generic and ready to be used in other Java-related projects.
Abstract: This paper presents K-Java, a complete executable formal semantics of Java 1.4. K-Java was extensively tested with a test suite developed alongside the project, following the Test Driven Development methodology. In order to maintain clarity while handling the great size of Java, the semantics was split into two separate definitions -- a static semantics and a dynamic semantics. The output of the static semantics is a preprocessed Java program, which is passed as input to the dynamic semantics for execution. The preprocessed program is a valid Java program, which uses a subset of the features of Java. The semantics is applied to model-check multi-threaded programs. Both the test suite and the static semantics are generic and ready to be used in other Java-related projects.

137 citations

Proceedings ArticleDOI
03 Nov 2013
TL;DR: A novel model is proposed, which views unstable code in terms of optimizations that leverage undefined behavior, and a new static checker called Stack is introduced that precisely identifies unstable code.
Abstract: This paper studies an emerging class of software bugs called optimization-unstable code: code that is unexpectedly discarded by compiler optimizations due to undefined behavior in the program. Unstable code is present in many systems, including the Linux kernel and the Postgres database. The consequences of unstable code range from incorrect functionality to missing security checks.To reason about unstable code, this paper proposes a novel model, which views unstable code in terms of optimizations that leverage undefined behavior. Using this model, we introduce a new static checker called Stack that precisely identifies unstable code. Applying Stack to widely used systems has uncovered 160 new bugs that have been confirmed and fixed by developers.

130 citations

References
More filters
Book
01 Jan 1997
TL;DR: Advanced Compiler Design and Implementation by Steven Muchnick Preface to Advanced Topics
Abstract: Advanced Compiler Design and Implementation by Steven Muchnick Preface 1 Introduction to Advanced Topics 1.1 Review of Compiler Structure 1.2 Advanced Issues in Elementary Topics 1.3 The Importance of Code Optimization 1.4 Structure of Optimizing Compilers 1.5 Placement of Optimizations in Aggressive Optimizing Compilers 1.6 Reading Flow Among the Chapters 1.7 Related Topics Not Covered in This Text 1.8 Target Machines Used in Examples 1.9 Number Notations and Data Sizes 1.10 Wrap-Up 1.11 Further Reading 1.12 Exercises 2 Informal Compiler Algorithm Notation (ICAN) 2.1 Extended Backus-Naur Form Syntax Notation 2.2 Introduction to ICAN 2.3 A Quick Overview of ICAN 2.4 Whole Programs 2.5 Type Definitions 2.6 Declarations 2.7 Data Types and Expressions 2.8 Statements 2.9 Wrap-Up 2.10 Further Reading 2.11 Exercises 3 Symbol-Table Structure 3.1 Storage Classes, Visibility, and Lifetimes 3.2 Symbol Attributes and Symbol-Table Entries 3.3 Local Symbol-Table Management 3.4 Global Symbol-Table Structure 3.5 Storage Binding and Symbolic Registers 3.6 Approaches to Generating Loads and Stores 3.7 Wrap-Up 3.8 Further Reading 3.9 Exercises 4 Intermediate Representations 4.1 Issues in Designing an Intermediate Language 4.2 High-Level Intermediate Languages 4.3 Medium-Level Intermediate Languages 4.4 Low-Level Intermediate Languages 4.5 Multi-Level Intermediate Languages 4.6 Our Intermediate Languages: MIR, HIR, and LIR 4.7 Representing MIR, HIR, and LIR in ICAN 4.8 ICAN Naming of Data Structures and Routines that Manipulate Intermediate Code 4.9 Other Intermediate-Language Forms 4.10 Wrap-Up 4.11 Further Reading 4.12 Exercises 5 Run-Time Support 5.1 Data Representations and Instructions 5.2 Register Usage 5.3 The Local Stack Frame 5.4 The Run-Time Stack 5.5 Parameter-Passing Disciplines 5.6 Procedure Prologues, Epilogues, Calls, and Returns 5.7 Code Sharing and Position-Independent Code 5.8 Symbolic and Polymorphic Language Support 5.9 Wrap-Up 5.10 Further Reading 5.11 Exercises 6 Producing Code Generators Automatically 6.1 Introduction to Automatic Generation of Code Generators 6.2 A Syntax-Directed Technique 6.3 Introduction to Semantics-Directed Parsing 6.4 Tree Pattern Matching and Dynamic Programming 6.5 Wrap-Up 6.6 Further Reading 6.7 Exercises 7 Control-Flow Analysis 7.1 Approaches to Control-Flow Analysis 7.2 Depth-First Search, Preorder Traversal, Postorder Traversal, and Breadth-First Search 7.3 Dominators 7.4 Loops and Strongly Connected Components 7.5 Reducibility 7.6 Interval Analysis and Control Trees 7.7 Structural Analysis 7.8 Wrap-Up 7.9 Further Reading 7.10 Exercises 8 Data-Flow Analysis 8.1 An Example: Reaching Definitions 8.2 Basic Concepts: Lattices, Flow Functions, and Fixed Points 8.3 Taxonomy of Data-Flow Problems and Solution Methods 8.4 Iterative Data-Flow Analysis 8.5 Lattices of Flow Functions 8.6 Control-Tree-Based Data-Flow Analysis 8.7 Structural Analysis 8.8 Interval Analysis 8.9 Other Approaches 8.10 Du-Chains, Ud-Chains, and Webs 8.11 Static Single-Assignment (SSA) Form 8.12 Dealing with Arrays, Structures, and Pointers 8.13 Automating Construction of Data-Flow Analyzers 8.14 More Ambitious Analyses 8.15 Wrap-Up 8.16 Further Reading 8.17 Exercises 9 Dependence Analysis and Dependence Graph 9.1 Dependence Relations 9.2 Basic-Block Dependence DAGs 9.3 Dependences in Loops 9.4 Dependence Testing 9.5 Program-Dependence Graphs 9.6 Dependences Between Dynamically Allocated Objects 9.7 Wrap-Up 9.8 Further Reading 9.9 Exercises 10 Alias Analysis 10.1 Aliases in Various Real Programming Languages 10.2 The Alias Gatherer 10.3 The Alias Propagator 10.4 Wrap-Up 10.5 Further Reading 10.6 Exercises 11 Introduction to Optimization 11.1 Global Optimizations Discussed in Chapters 12 Through 18 11.2 Flow Sensitivity and May vs. Must Information 11.3 Importance of Individual Optimizations 11.4 Order and Repetition of Optimizations 11.5 Further Reading 11.6 Exercises 12 Early Optimizations 12.1 Constant-Expression Evaluation (Constant Folding) 12.2 Scalar Replacement of Aggregates 12.3 Algebraic Simplifications and Reassociation 12.4 Value Numbering 12.5 Copy Propagation 12.6 Sparse Conditional Constant Propagation 12.7 Wrap-Up 12.8 Further Reading 12.9 Exercises 13 Redundancy Elimination 13.1 Common-Subexpression Elimination 13.2 Loop-Invariant Code Motion 13.3 Partial-Redundancy Elimination 13.4 Redundancy Elimination and Reassociation 13.5 Code Hoisting 13.6 Wrap-Up 13.7 Further Reading 13.8 Exercises 14 Loop Optimizations 14.1 Induction-Variable Optimizations 14.2 Unnecessary Bounds-Checking Elimination 14.3 Wrap-Up 14.4 Further Reading 14.5 Exercises 15 Procedure Optimizations 15.1 Tail-Call Optimization and Tail-Recursion Elimination 15.2 Procedure Integration 15.3 In-Line Expansion 15.4 Leaf-Routine Optimization and Shrink Wrapping 15.5 Wrap-Up 15.6 Further Reading 15.7 Exercises 16 Register Allocation 16.1 Register Allocation and Assignment 16.2 Local Methods 16.3 Graph Coloring 16.4 Priority-Based Graph Coloring 16.5 Other Approaches to Register Allocation 16.6 Wrap-Up 16.7 Further Reading 16.8 Exercises 17 Code Scheduling 17.1 Instruction Scheduling 17.2 Speculative Loads and Boosting 17.3 Speculative Scheduling 17.4 Software Pipelining 17.5 Trace Scheduling 17.6 Percolation Scheduling 17.7 Wrap-Up 17.8 Further Reading 17.9 Exercises 18 Control-Flow and Low-Level Optimizations 18.1 Unreachable-Code Elimination 18.2 Straightening 18.3 If Simplifications 18.4 Loop Simplifications 18.5 Loop Inversion 18.6 Unswitching 18.7 Branch Optimizations 18.8 Tail Merging or Cross Jumping 18.9 Conditional Moves 18.10 Dead-Code Elimination 18.11 Branch Prediction 18.12 Machine Idioms and Instruction Combining 18.13 Wrap-Up 18.14 Further Reading 18.15 Exercises 19 Interprocedural Analysis and Optimization 19.1 Interprocedural Control-Flow Analysis: The Call Graph 19.2 Interprocedural Data-Flow Analysis 19.3 Interprocedural Constant Propagation 19.4 Interprocedural Alias Analysis 19.5 Interprocedural Optimizations 19.6 Interprocedural Register Allocation 19.7 Aggregation of Global References 19.8 Other Issues in Interprocedural Program Management 19.9 Wrap-Up 19.10 Further Reading 19.11 Exercises 20 Optimization of the Memory Hierarchy 20.1 Impact of Data and Instruction Caches 20.2 Instruction-Cache Optimization 20.3 Scalar Replacement of Array Elements 20.4 Data-Cache Optimization 20.5 Scalar vs. Memory-Oriented Optimizations 20.6 Wrap-Up 20.7 Further Reading 20.8 Exercises 21 Case Studies of Compilers and Future Trends 21.1 the Sun Compilers for SPARC 21.2 The IBM XL Compilers for the POWER and PowerPC Architectures 21.3 Digital Equipment's Compilers for Alpha 21.4 The Intel Reference Compilers for the Intel 386 Architecture 21.5 Future Trends in Compiler Design and Implementation 21.6 Further Reading A Guide to Assembly Languages Used in This Book A.1 Sun SPARC Versions 8 and 9 Assembly Language A.2 IBM POWER and PowerPC Assembly Language A.3 DEC Alpha Assembly Language A.4 Intel 386 Architecture Assembly Language A.5 Hewlett-Packard's PA-RISC Assembly Language B Representation of Sets, Sequences, Trees, DAGs, and Functions B.1 Representation of Sets B.2 Representation of Sequences B.3 Representation of Trees and DAGs B.4 Representation of Functions B.5 Further Reading C Software Resources View Appendix C with live links to download sites C.1 Finding and Accessing Software on the Internet C.2 Machine Simulators C.3 Compilers C.4 Code-Generator Generators: BURG and IBURG C.5 Profiling Tools Bibliography Indices

2,482 citations

01 Jan 1978
TL;DR: This ebook is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Language (2nd Ed.), and is a "must-have" reference for every serious programmer's digital library.
Abstract: This ebook is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Language (2nd Ed.). One of the best-selling programming books published in the last fifty years, "K&R" has been called everything from the "bible" to "a landmark in computer science" and it has influenced generations of programmers. Available now for all leading ebook platforms, this concise and beautifully written text is a "must-have" reference for every serious programmers digital library. As modestly described by the authors in the Preface to the First Edition, this "is not an introductory programming manual; it assumes some familiarity with basic programming concepts like variables, assignment statements, loops, and functions. Nonetheless, a novice programmer should be able to read along and pick up the language, although access to a more knowledgeable colleague will help."

2,120 citations

Book
01 Jan 1978
TL;DR: The C Programming Language (2nd Ed.) as discussed by the authors is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Languages (1st Ed.).
Abstract: This ebook is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Language (2nd Ed.). One of the best-selling programming books published in the last fifty years, "K&R" has been called everything from the "bible" to "a landmark in computer science" and it has influenced generations of programmers. Available now for all leading ebook platforms, this concise and beautifully written text is a "must-have" reference for every serious programmers digital library. As modestly described by the authors in the Preface to the First Edition, this "is not an introductory programming manual; it assumes some familiarity with basic programming concepts like variables, assignment statements, loops, and functions. Nonetheless, a novice programmer should be able to read along and pick up the language, although access to a more knowledgeable colleague will help."

1,861 citations

Journal ArticleDOI
TL;DR: Maude as discussed by the authors is a programming language whose modules are rewriting logic theories, which is defined and given denotational and operational semantics, and it provides a simple unification of concurrent programming with functional and object-oriented programming and supports high level declarative programming of concurrent systems.

1,345 citations

Book ChapterDOI
08 Apr 2002
TL;DR: The structure of CIL is described, with a focus on how it disambiguates those features of C that were found to be most confusing for program analysis and transformation, allowing a complete project to be viewed as a single compilation unit.
Abstract: This paper describes the C Intermediate Language: a high-level representation along with a set of tools that permit easy analysis and source-to-source transformation of C programs.Compared to C, CIL has fewer constructs. It breaks down certain complicated constructs of C into simpler ones, and thus it works at a lower level than abstract-syntax trees. But CIL is also more high-level than typical intermediate languages (e.g., three-address code) designed for compilation. As a result, what we have is a representation that makes it easy to analyze and manipulate C programs, and emit them in a form that resembles the original source. Moreover, it comes with a front-end that translates to CIL not only ANSI C programs but also those using Microsoft C or GNU C extensions.We describe the structure of CIL with a focus on how it disambiguates those features of C that we found to be most confusing for program analysis and transformation. We also describe a whole-program merger based on structural type equality, allowing a complete project to be viewed as a single compilation unit. As a representative application of CIL, we show a transformation aimed at making code immune to stack-smashing attacks. We are currently using CIL as part of a system that analyzes and instruments C programs with run-time checks to ensure type safety. CIL has served us very well in this project, and we believe it can usefully be applied in other situations as well.

1,065 citations