scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Parameterized object sensitivity for points-to analysis for Java

TL;DR: This work presents object sensitivity, a new form of context sensitivity for flow-insensitive points-to analysis for Java, and proposes a parameterization framework that allows analysis designers to control the tradeoffs between cost and precision in the object-sensitive analysis.
Abstract: The goal of points-to analysis for Java is to determine the set of objects pointed to by a reference variable or a reference object field. We present object sensitivity, a new form of context sensitivity for flow-insensitive points-to analysis for Java. The key idea of our approach is to analyze a method separately for each of the object names that represent run-time objects on which this method may be invoked. To ensure flexibility and practicality, we propose a parameterization framework that allows analysis designers to control the tradeoffs between cost and precision in the object-sensitive analysis.Side-effect analysis determines the memory locations that may be modified by the execution of a program statement. Def-use analysis identifies pairs of statements that set the value of a memory location and subsequently use that value. The information computed by such analyses has a wide variety of uses in compilers and software tools. This work proposes new versions of these analyses that are based on object-sensitive points-to analysis.We have implemented two instantiations of our parameterized object-sensitive points-to analysis. On a set of 23 Java programs, our experiments show that these analyses have comparable cost to a context-insensitive points-to analysis for Java which is based on Andersen's analysis for C. Our results also show that object sensitivity significantly improves the precision of side-effect analysis and call graph construction, compared to (1) context-insensitive analysis, and (2) context-sensitive points-to analysis that models context using the invoking call site. These experiments demonstrate that object-sensitive analyses can achieve substantial precision improvement, while at the same time remaining efficient and practical.
Citations
More filters
Book
01 Jan 2008
TL;DR: A novel technique for static race detection in Java programs, comprised of a series of stages that employ a combination of static analyses to successively reduce the pairs of memory accesses potentially involved in a race.
Abstract: We present a novel technique for static race detection in Java programs, comprised of a series of stages that employ a combination of static analyses to successively reduce the pairs of memory accesses potentially involved in a race. We have implemented our technique and applied it to a suite of multi-threaded Java programs. Our experiments show that it is precise, scalable, and useful, reporting tens to hundreds of serious and previously unknown concurrency bugs in large, widely-used programs with few false alarms.

526 citations

Proceedings ArticleDOI
11 Nov 2014
TL;DR: The signature matching algorithm of Apposcopy uses a combination of static taint analysis and a new form of program representation called Inter-Component Call Graph to efficiently detect Android applications that have certain control- and data-flow properties.
Abstract: We present Apposcopy, a new semantics-based approach for identifying a prevalent class of Android malware that steals private user information. Apposcopy incorporates (i) a high-level language for specifying signatures that describe semantic characteristics of malware families and (ii) a static analysis for deciding if a given application matches a malware signature. The signature matching algorithm of Apposcopy uses a combination of static taint analysis and a new form of program representation called Inter-Component Call Graph to efficiently detect Android applications that have certain control- and data-flow properties. We have evaluated Apposcopy on a corpus of real-world Android applications and show that it can effectively and reliably pinpoint malicious applications that belong to certain malware families.

425 citations


Cites methods from "Parameterized object sensitivity fo..."

  • ...For context-sensitivity, we use a hybrid approach that combines call-site sensitivity [30] and object-sensitivity [31]....

    [...]

Proceedings ArticleDOI
Omer Tripp1, Marco Pistoia1, Stephen J. Fink1, Manu Sridharan1, Omri Weisman1 
15 Jun 2009
TL;DR: This paper designed and implemented a static Taint Analysis for Java (TAJ) that meets the requirements of industry-level applications, and evaluates TAJ against production-level benchmarks, and compares it with alternative solutions.
Abstract: Taint analysis, a form of information-flow analysis, establishes whether values from untrusted methods and parameters may flow into security-sensitive operations. Taint analysis can detect many common vulnerabilities in Web applications, and so has attracted much attention from both the research community and industry. However, most static taint-analysis tools do not address critical requirements for an industrial-strength tool. Specifically, an industrial-strength tool must scale to large industrial Web applications, model essential Web-application code artifacts, and generate consumable reports for a wide range of attack vectors.We have designed and implemented a static Taint Analysis for Java (TAJ) that meets the requirements of industry-level applications. TAJ can analyze applications of virtually any size, as it employs a set of techniques designed to produce useful answers given limited time and space. TAJ addresses a wide variety of attack vectors, with techniques to handle reflective calls, flow through containers, nested taint, and issues in generating useful reports. This paper provides a description of the algorithms comprising TAJ, evaluates TAJ against production-level benchmarks, and compares it with alternative solutions.

379 citations

Journal ArticleDOI
TL;DR: The article presents the algorithms that were developed, explains how they are used to recover Intermediate Representations from executables that are similar to the IRs that would be available if one started from source code, and describes their application in the context of program understanding and automated bug hunting.
Abstract: Over the last seven years, we have developed static-analysis methods to recover a good approximation to the variables and dynamically allocated memory objects of a stripped executable, and to track the flow of values through them. The article presents the algorithms that we developed, explains how they are used to recover Intermediate Representations (IRs) from executables that are similar to the IRs that would be available if one started from source code, and describes their application in the context of program understanding and automated bug hunting.Unlike algorithms for analyzing executables that existed prior to our work, the ones presented in this article provide useful information about memory accesses, even in the absence of debugging information. The ideas described in the article are incorporated in a tool for analyzing Intel x86 executables, called CodeSurfer/x86. CodeSurfer/x86 builds a system dependence graph for the program, and provides a GUI for exploring the graph by (i) navigating its edges, and (ii) invoking operations, such as forward slicing, backward slicing, and chopping, to discover how parts of the program can impact other parts.To assess the usefulness of the IRs recovered by CodeSurfer/x86 in the context of automated bug hunting, we built a tool on top of CodeSurfer/x86, called Device-Driver Analyzer for x86 (DDA/x86), which analyzes device-driver executables for bugs. Without the benefit of either source code or symbol-table/debugging information, DDA/x86 was able to find known bugs (that had been discovered previously by source-code analysis tools), along with useful error traces, while having a low false-positive rate. DDA/x86 is the first known application of program analysis/verification techniques to industrial executables.

315 citations

Journal ArticleDOI
11 Jun 2006
TL;DR: This work has developed a refinement-based analysis that succeeds by simultaneously refining handling of method calls and heap accesses, allowing the analysis to precisely analyze important code while entirely skipping irrelevant code.
Abstract: We present a scalable and precise context-sensitive points-to analysis with three key properties: (1) filtering out of unrealizable paths, (2) a context-sensitive heap abstraction, and (3) a context-sensitive call graph. Previous work [21] has shown that all three properties are important for precisely analyzing large programs, e.g., to show safety of downcasts. Existing analyses typically give up one or more of the properties for scalability. We have developed a refinement-based analysis that succeeds by simultaneously refining handling of method calls and heap accesses, allowing the analysis to precisely analyze important code while entirely skipping irrelevant code. The analysis is demanddriven and client-driven, facilitating refinement specific to each queried variable and increasing scalability. In our experimental evaluation, our analysis proved the safety of 61% more casts than one of the most precise existing analyses across a suite of large benchmarks. The analysis checked the casts in under 13 minutes per benchmark (taking less than 1 second per query) and required only 35MB of memory, far less than previous approaches.

291 citations


Cites background or methods from "Parameterized object sensitivity fo..."

  • ...1 lib Object-sensitive [21, 23] Sub X X 1-limited Naik [24] Sub X X 3-limited Current paper Sub X X X...

    [...]

  • ...It is difficult to compare our analysis with theirs directly, as their race detection client raises object-sensitive queries, which are in general incomparable with context-sensitive queries [23]....

    [...]

  • ...1H: a 1-limited object-sensitive analysis [23] (i....

    [...]

  • ...Algorithm Eq / Sub CS CG CS Heap Shown to Scale Zhu / Whaley [43, 46] Sub X Whaley [44] Sub Choi [7] Sub Fähndrich [9] Eq X X Rehof [27] Sub X Wilson [45] Sub X Cherem [6] Eq X Ruf [33] Eq X 1.1 lib Liang [22] Eq X Guyer [12] Sub X Lattner [17] Eq X X O Callahan [26] Eq X X Steensgaard (CS) [38] Eq X X X Wang [42] Sub X X 1.1 lib Object-sensitive [21, 23] Sub X X 1-limited Naik [24] Sub X X 3-limited Current paper Sub X X X scale to medium-sized programs....

    [...]

  • ...We compare extensively with the BDD-based 1-limited object-sensitive analysis of [21] in Section 6; object-sensitive analysis [23] analyzes methods separately based on the receiver object instead of using call strings, exploiting typical object-oriented code structure for greater precision and scalability....

    [...]

References
More filters
Book
01 Jan 1986
TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
Abstract: 1 Introduction 1.1 Language Processors 1.2 The Structure of a Compiler 1.3 The Evolution of Programming Languages 1.4 The Science of Building a Compiler 1.5 Applications of Compiler Technology 1.6 Programming Language Basics 1.7 Summary of Chapter 1 1.8 References for Chapter 1 2 A Simple Syntax-Directed Translator 2.1 Introduction 2.2 Syntax Definition 2.3 Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis 2.7 Symbol Tables 2.8 Intermediate Code Generation 2.9 Summary of Chapter 2 3 Lexical Analysis 3.1 The Role of the Lexical Analyzer 3.2 Input Buffering 3.3 Specification of Tokens 3.4 Recognition of Tokens 3.5 The Lexical-Analyzer Generator Lex 3.6 Finite Automata 3.7 From Regular Expressions to Automata 3.8 Design of a Lexical-Analyzer Generator 3.9 Optimization of DFA-Based Pattern Matchers 3.10 Summary of Chapter 3 3.11 References for Chapter 3 4 Syntax Analysis 4.1 Introduction 4.2 Context-Free Grammars 4.3 Writing a Grammar 4.4 Top-Down Parsing 4.5 Bottom-Up Parsing 4.6 Introduction to LR Parsing: Simple LR 4.7 More Powerful LR Parsers 4.8 Using Ambiguous Grammars 4.9 Parser Generators 4.10 Summary of Chapter 4 4.11 References for Chapter 4 5 Syntax-Directed Translation 5.1 Syntax-Directed Definitions 5.2 Evaluation Orders for SDD's 5.3 Applications of Syntax-Directed Translation 5.4 Syntax-Directed Translation Schemes 5.5 Implementing L-Attributed SDD's 5.6 Summary of Chapter 5 5.7 References for Chapter 5 6 Intermediate-Code Generation 6.1 Variants of Syntax Trees 6.2 Three-Address Code 6.3 Types and Declarations 6.4 Translation of Expressions 6.5 Type Checking 6.6 Control Flow 6.7 Backpatching 6.8 Switch-Statements 6.9 Intermediate Code for Procedures 6.10 Summary of Chapter 6 6.11 References for Chapter 6 7 Run-Time Environments 7.1 Storage Organization 7.2 Stack Allocation of Space 7.3 Access to Nonlocal Data on the Stack 7.4 Heap Management 7.5 Introduction to Garbage Collection 7.6 Introduction to Trace-Based Collection 7.7 Short-Pause Garbage Collection 7.8 Advanced Topics in Garbage Collection 7.9 Summary of Chapter 7 7.10 References for Chapter 7 8 Code Generation 8.1 Issues in the Design of a Code Generator 8.2 The Target Language 8.3 Addresses in the Target Code 8.4 Basic Blocks and Flow Graphs 8.5 Optimization of Basic Blocks 8.6 A Simple Code Generator 8.7 Peephole Optimization 8.8 Register Allocation and Assignment 8.9 Instruction Selection by Tree Rewriting 8.10 Optimal Code Generation for Expressions 8.11 Dynamic Programming Code-Generation 8.12 Summary of Chapter 8 8.13 References for Chapter 8 9 Machine-Independent Optimizations 9.1 The Principal Sources of Optimization 9.2 Introduction to Data-Flow Analysis 9.3 Foundations of Data-Flow Analysis 9.4 Constant Propagation 9.5 Partial-Redundancy Elimination 9.6 Loops in Flow Graphs 9.7 Region-Based Analysis 9.8 Symbolic Analysis 9.9 Summary of Chapter 9 9.10 References for Chapter 9 10 Instruction-Level Parallelism 10.1 Processor Architectures 10.2 Code-Scheduling Constraints 10.3 Basic-Block Scheduling 10.4 Global Code Scheduling 10.5 Software Pipelining 10.6 Summary of Chapter 10 10.7 References for Chapter 10 11 Optimizing for Parallelism and Locality 11.1 Basic Concepts 11.2 Matrix Multiply: An In-Depth Example 11.3 Iteration Spaces 11.4 Affine Array Indexes 11.5 Data Reuse 11.6 Array Data-Dependence Analysis 11.7 Finding Synchronization-Free Parallelism 11.8 Synchronization Between Parallel Loops 11.9 Pipelining 11.10 Locality Optimizations 11.11 Other Uses of Affine Transforms 11.12 Summary of Chapter 11 11.13 References for Chapter 11 12 Interprocedural Analysis 12.1 Basic Concepts 12.2 Why Interprocedural Analysis? 12.3 A Logical Representation of Data Flow 12.4 A Simple Pointer-Analysis Algorithm 12.5 Context-Insensitive Interprocedural Analysis 12.6 Context-Sensitive Pointer Analysis 12.7 Datalog Implementation by BDD's 12.8 Summary of Chapter 12 12.9 References for Chapter 12 A A Complete Front End A.1 The Source Language A.2 Main A.3 Lexical Analyzer A.4 Symbol Tables and Types A.5 Intermediate Code for Expressions A.6 Jumping Code for Boolean Expressions A.7 Intermediate Code for Statements A.8 Parser A.9 Creating the Front End B Finding Linearly Independent Solutions Index

8,437 citations

Book
12 Sep 1996
TL;DR: The Java Language Specification, Second Edition is the definitive technical reference for the Java programming language and provides complete, accurate, and detailed coverage of the syntax and semantics of the Java language.
Abstract: From the Publisher: Written by the inventors of the technology, The Java(tm) Language Specification, Second Edition is the definitive technical reference for the Java(tm) programming language If you want to know the precise meaning of the language's constructs, this is the source for you The book provides complete, accurate, and detailed coverage of the syntax and semantics of the Java programming language It describes all aspects of the language, including the semantics of all types, statements, and expressions, as well as threads and binary compatibility

4,383 citations

Proceedings ArticleDOI
25 Jan 1995
TL;DR: The paper shows how a large class of interprocedural dataflow-analysis problems can be solved precisely in polynomial time by transforming them into a special kind of graph-reachability problem.
Abstract: The paper shows how a large class of interprocedural dataflow-analysis problems can be solved precisely in polynomial time by transforming them into a special kind of graph-reachability problem. The only restrictions are that the set of dataflow facts must be a finite set, and that the dataflow functions must distribute over the confluence operator (either union or intersection). This class of probable problems includes—but is not limited to—the classical separable problems (also known as “gen/kill” or “bit-vector” problems)—e.g., reaching definitions, available expressions, and live variables. In addition, the class of problems that our techniques handle includes many non-separable problems, including truly-live variables, copy constant propagation, and possibly-uninitialized variables.Results are reported from a preliminary experimental study of C programs (for the problem of finding possibly-uninitialized variables).

1,154 citations

Proceedings ArticleDOI
Bjarne Steensgaard1
01 Jan 1996
TL;DR: This is the asymptotically fastest non-trivial interprocedural points-to analysis algorithm yet described and is based on a non-standard type system for describing a universally valid storage shape graph for a program in linear space.
Abstract: We present an interprocedural flow-insensitive points-to analysis based on type inference methods with an almost linear time cost complexity To our knowledge, this is the asymptotically fastest non-trivial interprocedural points-to analysis algorithm yet described The algorithm is based on a non-standard type system. The type inferred for any variable represents a set of locations and includes a type which in turn represents a set of locations possibly pointed to by the variable. The type inferred for a function variable represents a set of functions It may point to and includes a type signature for these functions The results are equivalent to those of a flow-insensitive alias analysis (and control flow analysis) that assumes alias relations are reflexive and transitive.This work makes three contributions. The first is a type system for describing a universally valid storage shape graph for a program in linear space. The second is a constraint system which often leads to better results than the "obvious" constraint system for the given type system The third is an almost linear time algorithm for points-to analysis by solving a constraint system.

1,127 citations

01 Jan 2005
TL;DR: This thesis presents an automatic partial evaluator for the Ansi C programming language, and proves that partial evaluation at most can accomplish linear speedup, and develops an automatic speedup analysis.
Abstract: Software engineers are faced with a dilemma. They want to write general and wellstructured programs that are flexible and easy to maintain. On the other hand, generality has a price: efficiency. A specialized program solving a particular problem is often significantly faster than a general program. However, the development of specialized software is time-consuming, and is likely to exceed the production of today’s programmers. New techniques are required to solve this so-called software crisis. Partial evaluation is a program specialization technique that reconciles the benefits of generality with efficiency. This thesis presents an automatic partial evaluator for the Ansi C programming language. The content of this thesis is analysis and transformation of C programs. We develop several analyses that support the transformation of a program into its generating extension. A generating extension is a program that produces specialized programs when executed on parts of the input. The thesis contains the following main results. • We develop a generating-extension transformation, and describe specialization of the various parts of C, including pointers and structures. • We develop constraint-based inter-procedural pointer and binding-time analysis. Both analyses are specified via non-standard type inference systems, and implemented by constraint solving. • We develop a side-effect and an in-use analysis. These analyses are developed in the classical monotone data-flow analysis framework. Some intriguing similarities with constraint-based analysis are observed. • We investigate separate and incremental program analysis and transformation. Realistic programs are structured into modules, which break down inter-procedural analyses that need global information about functions. • We prove that partial evaluation at most can accomplish linear speedup, and develop an automatic speedup analysis. • We study the stronger transformation technique driving, and initiate the development of generating super-extensions. The developments in this thesis are supported by an implementation. Throughout the chapters we present empirical results.

1,009 citations