Pixy: a static analysis tool for detecting Web application vulnerabilities

doi:10.1109/SP.2006.29

Home
/
Papers
/
Pixy: a static analysis tool for detecting Web application vulnerabilities

Proceedings Article•DOI•

Pixy: a static analysis tool for detecting Web application vulnerabilities

Nenad Jovanovic¹, Christopher Kruegel¹, Engin Kirda¹•Institutions (1)

University of Vienna¹

21 May 2006-pp 258-263

TL;DR: This paper uses flow-sensitive, interprocedural and context-sensitive dataflow analysis to discover vulnerable points in a program and applies it to the detection of vulnerability types such as SQL injection, cross-site scripting, or command injection.

read less

Abstract: The number and the importance of Web applications have increased rapidly over the last years. At the same time, the quantity and impact of security vulnerabilities in such applications have grown as well. Since manual code reviews are time-consuming, error-prone and costly, the need for automated solutions has become evident. In this paper, we address the problem of vulnerable Web applications by means of static source code analysis. More precisely, we use flow-sensitive, interprocedural and context-sensitive dataflow analysis to discover vulnerable points in a program. In addition, alias and literal analysis are employed to improve the correctness and precision of the results. The presented concepts are targeted at the general class of taint-style vulnerabilities and can be applied to the detection of vulnerability types such as SQL injection, cross-site scripting, or command injection. Pixy, the open source prototype implementation of our concepts, is targeted at detecting cross-site scripting vulnerabilities in PHP scripts. Using our tool, we discovered and reported 15 previously unknown vulnerabilities in three Web applications, and reconstructed 36 known vulnerabilities in three other Web applications. The observed false positive rate is at around 50% (i.e., one false positive for each vulnerability) and therefore, low enough to permit effective security audits.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

CHEX: statically vetting Android apps for component hijacking vulnerabilities

[...]

Long Lu¹, Zhichun Li², Zhenyu Wu², Wenke Lee¹, Guofei Jiang² - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Princeton University²

16 Oct 2012

TL;DR: This paper proposes CHEX, a static analysis method to automatically vet Android apps for component hijacking vulnerabilities, and prototyped CHEX based on Dalysis, a generic static analysis framework that was built to support many types of analysis on Android app bytecode.

...read moreread less

Abstract: An enormous number of apps have been developed for Android in recent years, making it one of the most popular mobile operating systems. However, the quality of the booming apps can be a concern [4]. Poorly engineered apps may contain security vulnerabilities that can severally undermine users' security and privacy. In this paper, we study a general category of vulnerabilities found in Android apps, namely the component hijacking vulnerabilities. Several types of previously reported app vulnerabilities, such as permission leakage, unauthorized data access, intent spoofing, and etc., belong to this category.We propose CHEX, a static analysis method to automatically vet Android apps for component hijacking vulnerabilities. Modeling these vulnerabilities from a data-flow analysis perspective, CHEX analyzes Android apps and detects possible hijack-enabling flows by conducting low-overhead reachability tests on customized system dependence graphs. To tackle analysis challenges imposed by Android's special programming paradigm, we employ a novel technique to discover component entry points in their completeness and introduce app splitting to model the asynchronous executions of multiple entry points in an app.We prototyped CHEX based on Dalysis, a generic static analysis framework that we built to support many types of analysis on Android app bytecode. We evaluated CHEX with 5,486 real Android apps and found 254 potential component hijacking vulnerabilities. The median execution time of CHEX on an app is 37.02 seconds, which is fast enough to be used in very high volume app vetting and testing scenarios.

...read moreread less

631 citations

Proceedings Article•

Cross Site Scripting Prevention with Dynamic Data Tainting and Static Analysis.

[...]

Philipp Vogt¹, Florian Nentwich¹, Nenad Jovanovic¹, Engin Kirda¹, Christopher Krügel¹, Giovanni Vigna² - Show less +2 more•Institutions (2)

Vienna University of Technology¹, University of California, Santa Barbara²

01 Jan 2007

TL;DR: The solution presented in this paper stops XSS attacks on the client side by tracking the flow of sensitive information inside the web browser and if sensitive information is about to be transferred to a third party, the user can decide if this should be permitted or not.

...read moreread less

Abstract: Cross-site scripting (XSS) is an attack against web applications in which scripting code is injected into the output of an application that is then sent to a user’s web browser In the browser, this scripting code is executed and used to transfer sensitive data to a third party (ie, the attacker) Currently, most approaches attempt to prevent XSS on the server side by inspecting and modifying the data that is exchanged between the web application and the user Unfortunately, it is often the case that vulnerable applications are not fixed for a considerable amount of time, leaving the users vulnerable to attacks The solution presented in this paper stops XSS attacks on the client side by tracking the flow of sensitive information inside the web browser If sensitive information is about to be transferred to a third party, the user can decide if this should be permitted or not As a result, the user has an additional protection layer when surfing the web, without solely depending on the security of the web application

...read moreread less

561 citations

Additional excerpts

...This could allow an attacker to leak information in a similar fashion....
[...]

Proceedings Article•

PiOS : Detecting privacy leaks in iOS applications

[...]

Manuel Egele¹•Institutions (1)

University of California, Santa Barbara¹

01 Jan 2011

TL;DR: To protect its users from malicious applications, Apple has introduced a vetting process, which should ensure that all applications conform to Apple’s (privacy) rules before they can be offered via the App Store, but this vetting process is not welldocumented.

...read moreread less

Abstract: With the introduction of Apple’s iOS and Google’s Android operating systems, the sales of smartphones have exploded. These smartphones have become powerful devices that are basically miniature versions of personal computers. However, the growing popularity and sophistication of smartphones have also increased concerns about the privacy of users who operate these devices. These concerns have been exacerbated by the fact that it has become increasingly easy for users to install and execute third-party applications. To protect its users from malicious applications, Apple has introduced a vetting process. This vetting process should ensure that all applications conform to Apple’s (privacy) rules before they can be offered via the App Store. Unfortunately, this vetting process is not welldocumented, and there have been cases where malicious applications had to be removed from the App Store after

...read moreread less

536 citations

Proceedings Article•DOI•

Modeling and Discovering Vulnerabilities with Code Property Graphs

[...]

Fabian Yamaguchi¹, Nico Golde², Daniel Arp¹, Konrad Rieck¹•Institutions (2)

University of Göttingen¹, Qualcomm²

18 May 2014

TL;DR: This paper introduces a novel representation of source code called a code property graph that merges concepts of classic program analysis, namely abstract syntax trees, control flow graphs and program dependence graphs, into a joint data structure that enables it to elegantly model templates for common vulnerabilities with graph traversals that can identify buffer overflows, integer overflOWS, format string vulnerabilities, or memory disclosures.

...read moreread less

Abstract: The vast majority of security breaches encountered today are a direct result of insecure code. Consequently, the protection of computer systems critically depends on the rigorous identification of vulnerabilities in software, a tedious and error-prone process requiring significant expertise. Unfortunately, a single flaw suffices to undermine the security of a system and thus the sheer amount of code to audit plays into the attacker's cards. In this paper, we present a method to effectively mine large amounts of source code for vulnerabilities. To this end, we introduce a novel representation of source code called a code property graph that merges concepts of classic program analysis, namely abstract syntax trees, control flow graphs and program dependence graphs, into a joint data structure. This comprehensive representation enables us to elegantly model templates for common vulnerabilities with graph traversals that, for instance, can identify buffer overflows, integer overflows, format string vulnerabilities, or memory disclosures. We implement our approach using a popular graph database and demonstrate its efficacy by identifying 18 previously unknown vulnerabilities in the source code of the Linux kernel.

...read moreread less

461 citations

Proceedings Article•DOI•

Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications

[...]

Davide Balzarotti¹, Marco Cova¹, Viktoria Felmetsger¹, Nenad Jovanovic, Engin Kirda², Christopher Kruegel¹, Giovanni Vigna¹ - Show less +3 more•Institutions (2)

University of California, Santa Barbara¹, Institut Eurécom²

18 May 2008

TL;DR: This paper combines static and dynamic analysis techniques to identify faulty sanitization procedures that can be bypassed by an attacker, and is able to identify several novel vulnerabilities that stem from erroneous sanitized procedures.

...read moreread less

Abstract: Web applications are ubiquitous, perform mission- critical tasks, and handle sensitive user data. Unfortunately, web applications are often implemented by developers with limited security skills, and, as a result, they contain vulnerabilities. Most of these vulnerabilities stem from the lack of input validation. That is, web applications use malicious input as part of a sensitive operation, without having properly checked or sanitized the input values prior to their use. Past research on vulnerability analysis has mostly focused on identifying cases in which a web application directly uses external input in critical operations. However, little research has been performed to analyze the correctness of the sanitization process. Thus, whenever a web application applies some sanitization routine to potentially malicious input, the vulnerability analysis assumes that the result is innocuous. Unfortunately, this might not be the case, as the sanitization process itself could be incorrect or incomplete. In this paper, we present a novel approach to the analysis of the sanitization process. More precisely, we combine static and dynamic analysis techniques to identify faulty sanitization procedures that can be bypassed by an attacker. We implemented our approach in a tool, called Saner, and we applied it to a number of real-world applications. Our results demonstrate that we were able to identify several novel vulnerabilities that stem from erroneous sanitization procedures.

...read moreread less

432 citations

Cites methods or result from "Pixy: a static analysis tool for de..."

...A number of past research efforts [9,13, 17 ,18,22,45] have focused on the problem of identifying vulnerabilities in which external input is used without any prior sanitization....
[...]
...This component is based on the open-source web vulnerability scanner called Pixy [ 17 , 18]....
[...]
...As already mentioned, our approach is based on Pixy [ 17 ,18], an open source static PHP analyzer that uses taint analysis for detecting XSS vulnerabilities....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151

Collapse

References

PDF

Open Access

More filters

Book•

Compilers: Principles, Techniques, and Tools

[...]

Alfred V. Aho¹, Ravi Sethi¹, Jeffrey D. Ullman²•Institutions (2)

Bell Labs¹, Stanford University²

01 Jan 1986

TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.

...read moreread less

Abstract: 1 Introduction 1.1 Language Processors 1.2 The Structure of a Compiler 1.3 The Evolution of Programming Languages 1.4 The Science of Building a Compiler 1.5 Applications of Compiler Technology 1.6 Programming Language Basics 1.7 Summary of Chapter 1 1.8 References for Chapter 1 2 A Simple Syntax-Directed Translator 2.1 Introduction 2.2 Syntax Definition 2.3 Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis 2.7 Symbol Tables 2.8 Intermediate Code Generation 2.9 Summary of Chapter 2 3 Lexical Analysis 3.1 The Role of the Lexical Analyzer 3.2 Input Buffering 3.3 Specification of Tokens 3.4 Recognition of Tokens 3.5 The Lexical-Analyzer Generator Lex 3.6 Finite Automata 3.7 From Regular Expressions to Automata 3.8 Design of a Lexical-Analyzer Generator 3.9 Optimization of DFA-Based Pattern Matchers 3.10 Summary of Chapter 3 3.11 References for Chapter 3 4 Syntax Analysis 4.1 Introduction 4.2 Context-Free Grammars 4.3 Writing a Grammar 4.4 Top-Down Parsing 4.5 Bottom-Up Parsing 4.6 Introduction to LR Parsing: Simple LR 4.7 More Powerful LR Parsers 4.8 Using Ambiguous Grammars 4.9 Parser Generators 4.10 Summary of Chapter 4 4.11 References for Chapter 4 5 Syntax-Directed Translation 5.1 Syntax-Directed Definitions 5.2 Evaluation Orders for SDD's 5.3 Applications of Syntax-Directed Translation 5.4 Syntax-Directed Translation Schemes 5.5 Implementing L-Attributed SDD's 5.6 Summary of Chapter 5 5.7 References for Chapter 5 6 Intermediate-Code Generation 6.1 Variants of Syntax Trees 6.2 Three-Address Code 6.3 Types and Declarations 6.4 Translation of Expressions 6.5 Type Checking 6.6 Control Flow 6.7 Backpatching 6.8 Switch-Statements 6.9 Intermediate Code for Procedures 6.10 Summary of Chapter 6 6.11 References for Chapter 6 7 Run-Time Environments 7.1 Storage Organization 7.2 Stack Allocation of Space 7.3 Access to Nonlocal Data on the Stack 7.4 Heap Management 7.5 Introduction to Garbage Collection 7.6 Introduction to Trace-Based Collection 7.7 Short-Pause Garbage Collection 7.8 Advanced Topics in Garbage Collection 7.9 Summary of Chapter 7 7.10 References for Chapter 7 8 Code Generation 8.1 Issues in the Design of a Code Generator 8.2 The Target Language 8.3 Addresses in the Target Code 8.4 Basic Blocks and Flow Graphs 8.5 Optimization of Basic Blocks 8.6 A Simple Code Generator 8.7 Peephole Optimization 8.8 Register Allocation and Assignment 8.9 Instruction Selection by Tree Rewriting 8.10 Optimal Code Generation for Expressions 8.11 Dynamic Programming Code-Generation 8.12 Summary of Chapter 8 8.13 References for Chapter 8 9 Machine-Independent Optimizations 9.1 The Principal Sources of Optimization 9.2 Introduction to Data-Flow Analysis 9.3 Foundations of Data-Flow Analysis 9.4 Constant Propagation 9.5 Partial-Redundancy Elimination 9.6 Loops in Flow Graphs 9.7 Region-Based Analysis 9.8 Symbolic Analysis 9.9 Summary of Chapter 9 9.10 References for Chapter 9 10 Instruction-Level Parallelism 10.1 Processor Architectures 10.2 Code-Scheduling Constraints 10.3 Basic-Block Scheduling 10.4 Global Code Scheduling 10.5 Software Pipelining 10.6 Summary of Chapter 10 10.7 References for Chapter 10 11 Optimizing for Parallelism and Locality 11.1 Basic Concepts 11.2 Matrix Multiply: An In-Depth Example 11.3 Iteration Spaces 11.4 Affine Array Indexes 11.5 Data Reuse 11.6 Array Data-Dependence Analysis 11.7 Finding Synchronization-Free Parallelism 11.8 Synchronization Between Parallel Loops 11.9 Pipelining 11.10 Locality Optimizations 11.11 Other Uses of Affine Transforms 11.12 Summary of Chapter 11 11.13 References for Chapter 11 12 Interprocedural Analysis 12.1 Basic Concepts 12.2 Why Interprocedural Analysis? 12.3 A Logical Representation of Data Flow 12.4 A Simple Pointer-Analysis Algorithm 12.5 Context-Insensitive Interprocedural Analysis 12.6 Context-Sensitive Pointer Analysis 12.7 Datalog Implementation by BDD's 12.8 Summary of Chapter 12 12.9 References for Chapter 12 A A Complete Front End A.1 The Source Language A.2 Main A.3 Lexical Analyzer A.4 Symbol Tables and Types A.5 Intermediate Code for Expressions A.6 Jumping Code for Boolean Expressions A.7 Intermediate Code for Statements A.8 Parser A.9 Creating the Front End B Finding Linearly Independent Solutions Index

...read moreread less

8,437 citations

"Pixy: a static analysis tool for de..." refers methods in this paper

...For this, we apply the technique of data flow analysis, which is a well-understood topic in computer science and has been used in compiler optimizations for decades ([1, 17, 19])....
[...]

Book•

Advanced Compiler Design and Implementation

[...]

Steven S. Muchnick

01 Jan 1997

TL;DR: Advanced Compiler Design and Implementation by Steven Muchnick Preface to Advanced Topics

...read moreread less

Abstract: Advanced Compiler Design and Implementation by Steven Muchnick Preface 1 Introduction to Advanced Topics 1.1 Review of Compiler Structure 1.2 Advanced Issues in Elementary Topics 1.3 The Importance of Code Optimization 1.4 Structure of Optimizing Compilers 1.5 Placement of Optimizations in Aggressive Optimizing Compilers 1.6 Reading Flow Among the Chapters 1.7 Related Topics Not Covered in This Text 1.8 Target Machines Used in Examples 1.9 Number Notations and Data Sizes 1.10 Wrap-Up 1.11 Further Reading 1.12 Exercises 2 Informal Compiler Algorithm Notation (ICAN) 2.1 Extended Backus-Naur Form Syntax Notation 2.2 Introduction to ICAN 2.3 A Quick Overview of ICAN 2.4 Whole Programs 2.5 Type Definitions 2.6 Declarations 2.7 Data Types and Expressions 2.8 Statements 2.9 Wrap-Up 2.10 Further Reading 2.11 Exercises 3 Symbol-Table Structure 3.1 Storage Classes, Visibility, and Lifetimes 3.2 Symbol Attributes and Symbol-Table Entries 3.3 Local Symbol-Table Management 3.4 Global Symbol-Table Structure 3.5 Storage Binding and Symbolic Registers 3.6 Approaches to Generating Loads and Stores 3.7 Wrap-Up 3.8 Further Reading 3.9 Exercises 4 Intermediate Representations 4.1 Issues in Designing an Intermediate Language 4.2 High-Level Intermediate Languages 4.3 Medium-Level Intermediate Languages 4.4 Low-Level Intermediate Languages 4.5 Multi-Level Intermediate Languages 4.6 Our Intermediate Languages: MIR, HIR, and LIR 4.7 Representing MIR, HIR, and LIR in ICAN 4.8 ICAN Naming of Data Structures and Routines that Manipulate Intermediate Code 4.9 Other Intermediate-Language Forms 4.10 Wrap-Up 4.11 Further Reading 4.12 Exercises 5 Run-Time Support 5.1 Data Representations and Instructions 5.2 Register Usage 5.3 The Local Stack Frame 5.4 The Run-Time Stack 5.5 Parameter-Passing Disciplines 5.6 Procedure Prologues, Epilogues, Calls, and Returns 5.7 Code Sharing and Position-Independent Code 5.8 Symbolic and Polymorphic Language Support 5.9 Wrap-Up 5.10 Further Reading 5.11 Exercises 6 Producing Code Generators Automatically 6.1 Introduction to Automatic Generation of Code Generators 6.2 A Syntax-Directed Technique 6.3 Introduction to Semantics-Directed Parsing 6.4 Tree Pattern Matching and Dynamic Programming 6.5 Wrap-Up 6.6 Further Reading 6.7 Exercises 7 Control-Flow Analysis 7.1 Approaches to Control-Flow Analysis 7.2 Depth-First Search, Preorder Traversal, Postorder Traversal, and Breadth-First Search 7.3 Dominators 7.4 Loops and Strongly Connected Components 7.5 Reducibility 7.6 Interval Analysis and Control Trees 7.7 Structural Analysis 7.8 Wrap-Up 7.9 Further Reading 7.10 Exercises 8 Data-Flow Analysis 8.1 An Example: Reaching Definitions 8.2 Basic Concepts: Lattices, Flow Functions, and Fixed Points 8.3 Taxonomy of Data-Flow Problems and Solution Methods 8.4 Iterative Data-Flow Analysis 8.5 Lattices of Flow Functions 8.6 Control-Tree-Based Data-Flow Analysis 8.7 Structural Analysis 8.8 Interval Analysis 8.9 Other Approaches 8.10 Du-Chains, Ud-Chains, and Webs 8.11 Static Single-Assignment (SSA) Form 8.12 Dealing with Arrays, Structures, and Pointers 8.13 Automating Construction of Data-Flow Analyzers 8.14 More Ambitious Analyses 8.15 Wrap-Up 8.16 Further Reading 8.17 Exercises 9 Dependence Analysis and Dependence Graph 9.1 Dependence Relations 9.2 Basic-Block Dependence DAGs 9.3 Dependences in Loops 9.4 Dependence Testing 9.5 Program-Dependence Graphs 9.6 Dependences Between Dynamically Allocated Objects 9.7 Wrap-Up 9.8 Further Reading 9.9 Exercises 10 Alias Analysis 10.1 Aliases in Various Real Programming Languages 10.2 The Alias Gatherer 10.3 The Alias Propagator 10.4 Wrap-Up 10.5 Further Reading 10.6 Exercises 11 Introduction to Optimization 11.1 Global Optimizations Discussed in Chapters 12 Through 18 11.2 Flow Sensitivity and May vs. Must Information 11.3 Importance of Individual Optimizations 11.4 Order and Repetition of Optimizations 11.5 Further Reading 11.6 Exercises 12 Early Optimizations 12.1 Constant-Expression Evaluation (Constant Folding) 12.2 Scalar Replacement of Aggregates 12.3 Algebraic Simplifications and Reassociation 12.4 Value Numbering 12.5 Copy Propagation 12.6 Sparse Conditional Constant Propagation 12.7 Wrap-Up 12.8 Further Reading 12.9 Exercises 13 Redundancy Elimination 13.1 Common-Subexpression Elimination 13.2 Loop-Invariant Code Motion 13.3 Partial-Redundancy Elimination 13.4 Redundancy Elimination and Reassociation 13.5 Code Hoisting 13.6 Wrap-Up 13.7 Further Reading 13.8 Exercises 14 Loop Optimizations 14.1 Induction-Variable Optimizations 14.2 Unnecessary Bounds-Checking Elimination 14.3 Wrap-Up 14.4 Further Reading 14.5 Exercises 15 Procedure Optimizations 15.1 Tail-Call Optimization and Tail-Recursion Elimination 15.2 Procedure Integration 15.3 In-Line Expansion 15.4 Leaf-Routine Optimization and Shrink Wrapping 15.5 Wrap-Up 15.6 Further Reading 15.7 Exercises 16 Register Allocation 16.1 Register Allocation and Assignment 16.2 Local Methods 16.3 Graph Coloring 16.4 Priority-Based Graph Coloring 16.5 Other Approaches to Register Allocation 16.6 Wrap-Up 16.7 Further Reading 16.8 Exercises 17 Code Scheduling 17.1 Instruction Scheduling 17.2 Speculative Loads and Boosting 17.3 Speculative Scheduling 17.4 Software Pipelining 17.5 Trace Scheduling 17.6 Percolation Scheduling 17.7 Wrap-Up 17.8 Further Reading 17.9 Exercises 18 Control-Flow and Low-Level Optimizations 18.1 Unreachable-Code Elimination 18.2 Straightening 18.3 If Simplifications 18.4 Loop Simplifications 18.5 Loop Inversion 18.6 Unswitching 18.7 Branch Optimizations 18.8 Tail Merging or Cross Jumping 18.9 Conditional Moves 18.10 Dead-Code Elimination 18.11 Branch Prediction 18.12 Machine Idioms and Instruction Combining 18.13 Wrap-Up 18.14 Further Reading 18.15 Exercises 19 Interprocedural Analysis and Optimization 19.1 Interprocedural Control-Flow Analysis: The Call Graph 19.2 Interprocedural Data-Flow Analysis 19.3 Interprocedural Constant Propagation 19.4 Interprocedural Alias Analysis 19.5 Interprocedural Optimizations 19.6 Interprocedural Register Allocation 19.7 Aggregation of Global References 19.8 Other Issues in Interprocedural Program Management 19.9 Wrap-Up 19.10 Further Reading 19.11 Exercises 20 Optimization of the Memory Hierarchy 20.1 Impact of Data and Instruction Caches 20.2 Instruction-Cache Optimization 20.3 Scalar Replacement of Array Elements 20.4 Data-Cache Optimization 20.5 Scalar vs. Memory-Oriented Optimizations 20.6 Wrap-Up 20.7 Further Reading 20.8 Exercises 21 Case Studies of Compilers and Future Trends 21.1 the Sun Compilers for SPARC 21.2 The IBM XL Compilers for the POWER and PowerPC Architectures 21.3 Digital Equipment's Compilers for Alpha 21.4 The Intel Reference Compilers for the Intel 386 Architecture 21.5 Future Trends in Compiler Design and Implementation 21.6 Further Reading A Guide to Assembly Languages Used in This Book A.1 Sun SPARC Versions 8 and 9 Assembly Language A.2 IBM POWER and PowerPC Assembly Language A.3 DEC Alpha Assembly Language A.4 Intel 386 Architecture Assembly Language A.5 Hewlett-Packard's PA-RISC Assembly Language B Representation of Sets, Sequences, Trees, DAGs, and Functions B.1 Representation of Sets B.2 Representation of Sequences B.3 Representation of Trees and DAGs B.4 Representation of Functions B.5 Further Reading C Software Resources View Appendix C with live links to download sites C.1 Finding and Accessing Software on the Internet C.2 Machine Simulators C.3 Compilers C.4 Code-Generator Generators: BURG and IBURG C.5 Profiling Tools Bibliography Indices

...read moreread less

2,482 citations

"Pixy: a static analysis tool for de..." refers methods in this paper

...For this, we apply the technique of data flow analysis, which is a well-understood topic in computer science and has been used in compiler optimizations for decades ([1, 17, 19])....
[...]

Book•

Principles of program analysis

[...]

Flemming Nielson, Hanne Riis Nielson, Chris Hankin

22 Oct 1999

TL;DR: This book is unique in providing an overview of the four major approaches to program analysis: data flow analysis, constraint-based analysis, abstract interpretation, and type and effect systems.

...read moreread less

Abstract: Program analysis utilizes static techniques for computing reliable information about the dynamic behavior of programs. Applications include compilers (for code improvement), software validation (for detecting errors) and transformations between data representation (for solving problems such as Y2K). This book is unique in providing an overview of the four major approaches to program analysis: data flow analysis, constraint-based analysis, abstract interpretation, and type and effect systems. The presentation illustrates the extensive similarities between the approaches, helping readers to choose the best one to utilize.

...read moreread less

1,955 citations

"Pixy: a static analysis tool for de..." refers methods in this paper

...For this, we apply the technique of data flow analysis, which is a well-understood topic in computer science and has been used in compiler optimizations for decades ([1, 17, 19])....
[...]

Proceedings Article•DOI•

Bugs as deviant behavior: a general approach to inferring errors in systems code

[...]

Dawson Engler¹, David Yu Chen¹, Seth Hallem¹, Andy Chou¹, Benjamin Chelf¹ - Show less +1 more•Institutions (1)

Stanford University¹

21 Oct 2001

TL;DR: Six checkers are developed that extract beliefs by tailoring rule "templates" to a system --- for example, finding all functions that fit the rule template "a must be paired with b."

...read moreread less

Abstract: A major obstacle to finding program errors in a real system is knowing what correctness rules the system must obey. These rules are often undocumented or specified in an ad hoc manner. This paper demonstrates techniques that automatically extract such checking information from the source code itself, rather than the programmer, thereby avoiding the need for a priori knowledge of system rules.The cornerstone of our approach is inferring programmer "beliefs" that we then cross-check for contradictions. Beliefs are facts implied by code: a dereference of a pointer, p, implies a belief that p is non-null, a call to "unlock(1)" implies that 1 was locked, etc. For beliefs we know the programmer must hold, such as the pointer dereference above, we immediately flag contradictions as errors. For beliefs that the programmer may hold, we can assume these beliefs hold and use a statistical analysis to rank the resulting errors from most to least likely. For example, a call to "spin_lock" followed once by a call to "spin_unlock" implies that the programmer may have paired these calls by coincidence. If the pairing happens 999 out of 1000 times, though, then it is probably a valid belief and the sole deviation a probable error. The key feature of this approach is that it requires no a priori knowledge of truth: if two beliefs contradict, we know that one is an error without knowing what the correct belief is.Conceptually, our checkers extract beliefs by tailoring rule "templates" to a system --- for example, finding all functions that fit the rule template "a must be paired with b." We have developed six checkers that follow this conceptual framework. They find hundreds of bugs in real systems such as Linux and OpenBSD. From our experience, they give a dramatic reduction in the manual effort needed to check a large system. Compared to our previous work [9], these template checkers find ten to one hundred times more rule instances and derive properties we found impractical to specify manually.

...read moreread less

775 citations

Proceedings Article•DOI•

Securing web application code by static analysis and runtime protection

[...]

Yao-Wen Huang¹, Fang Yu², Christian Hang³, Chung-Hung Tsai¹, Der-Tsai Lee², Sy-Yen Kuo¹ - Show less +2 more•Institutions (3)

National Taiwan University¹, Academia Sinica², RWTH Aachen University³

17 May 2004

TL;DR: A lattice-based static analysis algorithm derived from type systems and typestate is created, and its soundness is addressed, thus securing Web applications in the absence of user intervention and reducing potential runtime overhead by 98.4%.

...read moreread less

Abstract: Security remains a major roadblock to universal acceptance of the Web for many kinds of transactions, especially since the recent sharp increase in remotely exploitable vulnerabilities have been attributed to Web application bugs. Many verification tools are discovering previously unknown vulnerabilities in legacy C programs, raising hopes that the same success can be achieved with Web applications. In this paper, we describe a sound and holistic approach to ensuring Web application security. Viewing Web application vulnerabilities as a secure information flow problem, we created a lattice-based static analysis algorithm derived from type systems and typestate, and addressed its soundness. During the analysis, sections of code considered vulnerable are instrumented with runtime guards, thus securing Web applications in the absence of user intervention. With sufficient annotations, runtime overhead can be reduced to zero. We also created a tool named.WebSSARI (Web application Security by Static Analysis and Runtime Inspection) to test our algorithm, and used it to verify 230 open-source Web application projects on SourceForge.net, which were selected to represent projects of different maturity, popularity, and scale. 69 contained vulnerabilities. After notifying the developers, 38 acknowledged our findings and stated their plans to provide patches. Our statistics also show that static analysis reduced potential runtime overhead by 98.4%.

...read moreread less

655 citations

"Pixy: a static analysis tool for de..." refers background in this paper

...The observed false positive rate is at around 50% (i.e., one false positive for each vulnerability) and therefore, low enough to permit effective security audits....
[...]