Showing papers on "Static program analysis published in 2011"

PDF

Open Access

Proceedings Article•DOI•

Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders

[...]

Eric Bodden¹, Andreas Sewe¹, Jan Sinschek¹, Hela Oueslati¹, Mira Mezini¹ - Show less +1 more•Institutions (1)

21 May 2011

TL;DR: For the first time, TamiFlex enables sound static whole-program analyses on DaCapo and significantly improves code coverage of the static analyses, while for the former the approach even appears complete: the inserted runtime checks issue no warning.

...read moreread less

Abstract: Static program analyses and transformations for Java face many problems when analyzing programs that use reflection or custom class loaders: How can a static analysis know which reflective calls the program will execute? How can it get hold of classes that the program loads from remote locations or even generates on the fly? And if the analysis transforms classes, how can these classes be re-inserted into a program that uses custom class loaders? In this paper, we present TamiFlex, a tool chain that offers a partial but often effective solution to these problems. With TamiFlex, programmers can use existing static-analysis tools to produce results that are sound at least with respect to a set of recorded program runs. TamiFlex inserts runtime checks into the program that warn the user in case the program executes reflective calls that the analysis did not take into account. TamiFlex further allows programmers to re-insert offline-transformed classes into a program. We evaluate TamiFlex in two scenarios: benchmarking with the DaCapo benchmark suite and analysing large-scale interactive applications. For the latter, TamiFlex significantly improves code coverage of the static analyses, while for the former our approach even appears complete: the inserted runtime checks issue no warning. Hence, for the first time, TamiFlex enables sound static whole-program analyses on DaCapo. During this process, TamiFlex usually incurs less than 10% runtime overhead.

...read moreread less

239 citations

Book Chapter•DOI•

A static task partitioning approach for heterogeneous systems using OpenCL

[...]

Dominik Grewe¹, Michael O'Boyle¹•Institutions (1)

University of Edinburgh¹

26 Mar 2011

TL;DR: This work proposes a portable partitioning scheme for OpenCL programs on heterogeneous CPU-GPU systems based on a purely static approach based on predictive modelling and program features and achieves a speedup over a suite of 47 benchmarks.

...read moreread less

Abstract: Heterogeneous multi-core platforms are increasingly prevalent due to their perceived superior performance over homogeneous systems. The best performance, however, can only be achieved if tasks are accurately mapped to the right processors. OpenCL programs can be partitioned to take advantage of all the available processors in a system. However, finding the best partitioning for any heterogeneous system is difficult and depends on the hardware and software implementation. We propose a portable partitioning scheme for OpenCL programs on heterogeneous CPU-GPU systems. We develop a purely static approach based on predictive modelling and program features. When evaluated over a suite of 47 benchmarks, our model achieves a speedup of 1.57 over a state-of-the-art dynamic run-time approach, a speedup of 3.02 over a purely multi-core approach and 1.55 over the performance achieved by using just the GPU.

...read moreread less

220 citations

Proceedings Article•DOI•

Reducing combinatorics in testing product lines

[...]

Chang Hwan Peter Kim¹, Don Batory¹, Sarfraz Khurshid¹•Institutions (1)

University of Texas at Austin¹

21 Mar 2011

TL;DR: This paper represents an SPL in a form where conventional static program analysis techniques can be applied to find irrelevant features for a test, and uses this information to reduce the combinatorial number of SPL programs to examine.

...read moreread less

Abstract: A Software Product Line (SPL) is a family of programs where each program is defined by a unique combination of features. Testing or checking properties of an SPL is hard as it may require the examination of a combinatorial number of programs. In reality, however, features are often irrelevant for a given test - they augment, but do not change, existing behavior, making many feature combinations unnecessary as far as testing is concerned. In this paper we show how to reduce the amount of effort in testing an SPL. We represent an SPL in a form where conventional static program analysis techniques can be applied to find irrelevant features for a test. We use this information to reduce the combinatorial number of SPL programs to examine.

...read moreread less

130 citations

Proceedings Article•DOI•

Deobfuscation of virtualization-obfuscated software: a semantics-based approach

[...]

Kevin Coogan¹, Gen Lu¹, Saumya K. Debray¹•Institutions (1)

University of Arizona¹

17 Oct 2011

TL;DR: This paper proposes a different approach to the problem that focuses on identifying instructions that affect the observable behavior of the obfuscated code, and aims to complement existing techniques by broadening the domain of obfuscated programs eligible for automated analysis.

...read moreread less

Abstract: When new malware are discovered, it is important for researchers to analyze and understand them as quickly as possible. This task has been made more difficult in recent years as researchers have seen an increasing use of virtualization-obfuscated malware code. These programs are difficult to comprehend and reverse engineer, since they are resistant to both static and dynamic analysis techniques. Current approaches to dealing with such code first reverse-engineer the byte code interpreter, then use this to work out the logic of the byte code program. This outside-in approach produces good results when the structure of the interpreter is known, but cannot be applied to all cases. This paper proposes a different approach to the problem that focuses on identifying instructions that affect the observable behavior of the obfuscated code. This inside-out approach requires fewer assumptions, and aims to complement existing techniques by broadening the domain of obfuscated programs eligible for automated analysis. Results from a prototype tool on real-world malicious code are encouraging.

...read moreread less

127 citations

Journal Article•DOI•

Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining

[...]

Andy Zaidman¹, Bart Van Rompaey², Arie van Deursen¹, Serge Demeyer²•Institutions (2)

Delft University of Technology¹, University of Antwerp²

01 Jun 2011-Empirical Software Engineering

TL;DR: This paper proposes three views (realized by a tool called TeMo) that combine information from a software project’s versioning system, the size of the various artifacts and the test coverage reports that could recognize different co-evolution scenarios and make relevant observations for both developers as well as test engineers.

...read moreread less

Abstract: Many software production processes advocate rigorous development testing alongside functional code writing, which implies that both test code and production code should co-evolve. To gain insight in the nature of this co-evolution, this paper proposes three views (realized by a tool called TeMo) that combine information from a software project's versioning system, the size of the various artifacts and the test coverage reports. We validate these views against two open source and one industrial software project and evaluate our results both with the help of log messages, code inspections and the original developers of the software system. With these views we could recognize different co-evolution scenarios (i.e., synchronous and phased) and make relevant observations for both developers as well as test engineers.

...read moreread less

122 citations

Proceedings Article•DOI•

Using source code metrics to predict change-prone Java interfaces

[...]

Daniele Romano¹, Martin Pinzger¹•Institutions (1)

Delft University of Technology¹

25 Sep 2011

TL;DR: The results show that the external interface cohesion metric exhibits the strongest correlation with the number of source code changes and improves the performance of prediction models to classify Java interfaces into change-prone and not change- prone.

...read moreread less

Abstract: Recent empirical studies have investigated the use of source code metrics to predict the change- and defect-proneness of source code files and classes. While results showed strong correlations and good predictive power of these metrics, they do not distinguish between interface, abstract or concrete classes. In particular, interfaces declare contracts that are meant to remain stable during the evolution of a software system while the implementation in concrete classes is more likely to change. This paper aims at investigating to which extent the existing source code metrics can be used for predicting change-prone Java interfaces. We empirically investigate the correlation between metrics and the number of fine-grained source code changes in interfaces of ten Java open-source systems. Then, we evaluate the metrics to calculate models for predicting change-prone Java interfaces. Our results show that the external interface cohesion metric exhibits the strongest correlation with the number of source code changes. This metric also improves the performance of prediction models to classify Java interfaces into change-prone and not change-prone.

...read moreread less

121 citations

Proceedings Article•DOI•

Practical change impact analysis based on static program slicing for industrial software systems

[...]

Mithun Acharya, Brian Robinson

21 May 2011

TL;DR: The experiences in designing a static change impact analysis framework for large and evolving industrial software systems are shared and the framework is implemented as a tool called Imp and applied on an industrial codebase with over a million lines of C/ C++ code with promising empirical results.

...read moreread less

Abstract: Change impact analysis, i.e., knowing the potential consequences of a software change, is critical for the risk analysis, developer effort estimation, and regression testing of evolving software. Static program slicing is an attractive option for enabling routine change impact analysis for newly committed changesets during daily software build. For small programs with a few thousand lines of code, static program slicing scales well and can assist precise change impact analysis. However, as we demonstrate in this paper, static program slicing faces unique challenges when applied routinely on large and evolving industrial software systems. Despite recent advances in static program slicing, to our knowledge, there have been no studies of static change impact analysis applied on large and evolving industrial software systems. In this paper, we share our experiences in designing a static change impact analysis framework for such software systems. We have implemented our framework as a tool called Imp and have applied Imp on an industrial codebase with over a million lines of C/ C++ code with promising empirical results.

...read moreread less

119 citations

Proceedings Article•DOI•

Comparing fine-grained source code changes and code churn for bug prediction

[...]

Emanuel Giger¹, Martin Pinzger², Harald C. Gall¹•Institutions (2)

University of Zurich¹, Delft University of Technology²

21 May 2011

TL;DR: This paper presents a series of experiments using different machine learning algorithms with a dataset from the Eclipse platform to empirically evaluate the performance of SCC and LM and shows that SCC outperforms LM for learning bug prediction models.

...read moreread less

Abstract: A significant amount of research effort has been dedicated to learning prediction models that allow project managers to efficiently allocate resources to those parts of a software system that most likely are bug-prone and therefore critical. Prominent measures for building bug prediction models are product measures, e.g., complexity or process measures, such as code churn. Code churn in terms of lines modified (LM) and past changes turned out to be significant indicators of bugs. However, these measures are rather imprecise and do not reflect all the detailed changes of particular source code entities during maintenance activities. In this paper, we explore the advantage of using fine-grained source code changes (SCC) for bug prediction. SCC captures the exact code changes and their semantics down to statement level. We present a series of experiments using different machine learning algorithms with a dataset from the Eclipse platform to empirically evaluate the performance of SCC and LM. The results show that SCC outperforms LM for learning bug prediction models.

...read moreread less

110 citations

Proceedings Article•DOI•

Symbolic execution with mixed concrete-symbolic solving

[...]

Corina S. Păsăreanu¹, Neha Rungta, Willem Visser²•Institutions (2)

Carnegie Mellon University¹, Stellenbosch University²

17 Jul 2011

TL;DR: The proposed technique splits the generated path conditions into constraints that can be solved by a decision procedure and complex non-linear constraints with uninterpreted functions to represent external library calls to enable classical symbolic execution to cover paths that other dynamic symbolic execution approaches cannot cover.

...read moreread less

Abstract: Symbolic execution is a powerful static program analysis technique that has been used for the automated generation of test inputs. Directed Automated Random Testing (DART) is a dynamic variant of symbolic execution that initially uses random values to execute a program and collects symbolic path conditions during the execution. These conditions are then used to produce new inputs to execute the program along different paths. It has been argued that DART can handle situations where classical static symbolic execution fails due to incompleteness in decision procedures and its inability to handle external library calls.We propose here a technique that mitigates these previous limitations of classical symbolic execution. The proposed technique splits the generated path conditions into (a) constraints that can be solved by a decision procedure and (b) complex non-linear constraints with uninterpreted functions to represent external library calls. The solutions generated from the decision procedure are used to simplify the complex constraints and the resulting path conditions are checked again for satisfiability. We also present heuristics that can further improve our technique. We show how our technique can enable classical symbolic execution to cover paths that other dynamic symbolic execution approaches cannot cover. Our method has been implemented within the Symbolic PathFinder tool and has been applied to several examples, including two from the NASA domain.

...read moreread less

101 citations

Book Chapter•DOI•

Code obfuscation against static and dynamic reverse engineering

[...]

Sebastian Schrittwieser¹, Stefan Katzenbeisser²•Institutions (2)

Vienna University of Technology¹, Technische Universität Darmstadt²

18 May 2011

TL;DR: This paper introduces a novel code obfuscation scheme that applies the concept of software diversification to the control flow graph of the software to enhance its complexity and improves resistance against both static disassembling tools and dynamic reverse engineering at a reasonable performance penalty.

...read moreread less

Abstract: The process of reverse engineering allows attackers to understand the behavior of software and extract proprietary algorithms and data structures (e.g. cryptographic keys) from it. Code obfuscation is frequently employed to mitigate this risk. However, while most of today's obfuscation methods are targeted against static reverse engineering, where the attacker analyzes the code without actually executing it, they are still insecure against dynamic analysis techniques, where the behavior of the software is inspected at runtime. In this paper, we introduce a novel code obfuscation scheme that applies the concept of software diversification to the control flow graph of the software to enhance its complexity. Our approach aims at making dynamic reverse engineering considerably harder as the information an attacker can retrieve from the analysis of a single run of the program with a certain input, is useless for understanding the program behavior on other inputs. Based on a prototype implementation we show that our approach improves resistance against both static disassembling tools and dynamic reverse engineering at a reasonable performance penalty.

...read moreread less

87 citations

Proceedings Article•DOI•

Statically locating web application bugs caused by asynchronous calls

[...]

Yunhui Zheng¹, Tao Bao¹, Xiangyu Zhang¹•Institutions (1)

Purdue University¹

28 Mar 2011

TL;DR: This paper develops a technique that extracts client-side JavaScript code from server-side scripts that can effectively identify real bugs and proposes a static program analysis to automatically detect such bugs in web applications.

...read moreread less

Abstract: Ajax becomes more and more important for web applications that care about client side user experience. It allows sending requests asynchronously, without blocking clients from continuing execution. Callback functions are only executed upon receiving the responses. While such mechanism makes browsing a smooth experience, it may cause severe problems in the presence of unexpected network latency, due to the non-determinism of asynchronism. In this paper, we demonstrate the possible problems caused by the asynchronism and propose a static program analysis to automatically detect such bugs in web applications. As client side Ajax code is often wrapped in server-side scripts, we also develop a technique that extracts client-side JavaScript code from server-side scripts. We evaluate our technique on a number of real-world web applications. Our results show that it can effectively identify real bugs. We also discuss possible ways to avoid such bugs.

...read moreread less

Proceedings Article•DOI•

Expanding identifiers to normalize source code vocabulary

[...]

Dawn Lawrie¹, Dave Binkley¹•Institutions (1)

Loyola University Maryland¹

25 Sep 2011

TL;DR: Building on a successful approach to identifiers, an implementation of an expansion algorithm is presented that finds that up to 66% of identifiers are correctly expanded, which is within about 20% of the current system's best-case performance.

...read moreread less

Abstract: Maintaining modern software requires significant tool support. Effective tools exploit a variety of information and techniques to aid a software maintainer. One area of recent interest in tool development exploits the natural language information found in source code. Such Information Retrieval (IR) based tools compliment traditional static analysis tools and have tackled problems, such as feature location, that otherwise require considerable human effort. To reap the full benefit of IR-based techniques, the language used across all software artifacts (e.g., requirements, design, change requests, tests, and source code) must be consistent. Unfortunately, there is a significant proportion of invented vocabulary in source code. Vocabulary normalization aligns the vocabulary found in the source code with that found in other software artifacts. Most existing work related to normalization has focused on splitting an identifier into its constituent parts. The next step is to expand each part into a (dictionary) word that matches the vocabulary used in other software artifacts. Building on a successful approach to splitting identifiers, an implementation of an expansion algorithm is presented. Experiments on two systems find that up to 66% of identifiers are correctly expanded, which is within about 20% of the current system's best-case performance. Not only is this performance comparable to previous techniques, but the result is achieved in the absence of special purpose rules and not limited to restricted syntactic contexts. Results from these experiments also show the impact that varying levels of documentation (including both internal documentation such as the requirements and design, and external, or user-level, documentation) have on the algorithm's performance.

...read moreread less

Proceedings Article•DOI•

Improving source code search with natural language phrasal representations of method signatures

[...]

Emily Hill¹, Lori Pollock², K. Vijay-Shanker²•Institutions (2)

Montclair State University¹, University UCINF²

06 Nov 2011

TL;DR: This paper presents a novel search technique that uses information such as the position of the query word and its semantic role to calculate relevance and shows that this approach is more consistently effective than three other state of the art search techniques.

...read moreread less

Abstract: As software continues to grow, locating code for maintenance tasks becomes increasingly difficult. Software search tools help developers find source code relevant to their maintenance tasks. One major challenge to successful search tools is locating relevant code when the user's query contains words with multiple meanings or words that occur frequently throughout the program. Traditional search techniques, which treat each word individually, are unable to distinguish relevant and irrelevant methods under these conditions. In this paper, we present a novel search technique that uses information such as the position of the query word and its semantic role to calculate relevance. Our evaluation shows that this approach is more consistently effective than three other state of the art search techniques.

...read moreread less

Proceedings Article•DOI•

An Experience Report on Using Code Smells Detection Tools

[...]

Francesca Arcelli Fontana¹, Elia Mariani¹, Andrea Mornioli¹, Raul Sormani¹, Alberto Tonello¹ - Show less +1 more•Institutions (1)

University of Milano-Bicocca¹

21 Mar 2011

TL;DR: The aim of this paper is to describe the experience on using different tools for code smell detection, and outline the main differences among them and the different results obtained.

...read moreread less

Abstract: Detecting code smells in the code and consequently applying the right refactoring steps when necessary is very important to improve the quality of the code. Different tools have been proposed for code smell detection, each one characterized by particular features. The aim of this paper is to describe our experience on using different tools for code smell detection. We outline the main differences among them and the different results we obtained.

...read moreread less

Proceedings Article•DOI•

On the congruence of modularity and code coupling

[...]

Fabian Beck¹, Stephan Diehl¹•Institutions (1)

University of Trier¹

09 Sep 2011

TL;DR: This study looks at different kinds of coupling concepts between source code entities, including structural dependencies, fan-out similarity, evolutionary coupling, code ownership, code clones, and semantic similarity, and provides insights on how to support developers to modularize software systems.

...read moreread less

Abstract: Software systems are modularized to make their inherent complexity manageable. While there exists a set of well-known principles that may guide software engineers to design the modules of a software system, we do not know which principles are followed in practice. In a study based on 16 open source projects, we look at different kinds of coupling concepts between source code entities, including structural dependencies, fan-out similarity, evolutionary coupling, code ownership, code clones, and semantic similarity. The congruence between these coupling concepts and the modularization of the system hints at the modularity principles used in practice. Furthermore, the results provide insights on how to support developers to modularize software systems.

...read moreread less

Journal Article•DOI•

Improving Source Code Lexicon via Traceability and Information Retrieval

[...]

A. De Lucia¹, M. Di Penta², Rocco Oliveto³•Institutions (3)

University of Salerno¹, University of Sannio², University of Molise³

01 Mar 2011-IEEE Transactions on Software Engineering

TL;DR: The achieved results confirm the conjecture that providing the developers with similarity between code and high-level artifacts helps to improve the quality of source code lexicon and indicates the potential usefulness of COCONUT as a feature for software development environments.

...read moreread less

Abstract: The paper presents an approach helping developers to maintain source code identifiers and comments consistent with high-level artifacts. Specifically, the approach computes and shows the textual similarity between source code and related high-level artifacts. Our conjecture is that developers are induced to improve the source code lexicon, i.e., terms used in identifiers or comments, if the software development environment provides information about the textual similarity between the source code under development and the related high-level artifacts. The proposed approach also recommends candidate identifiers built from high-level artifacts related to the source code under development and has been implemented as an Eclipse plug-in, called COde Comprehension Nurturant Using Traceability (COCONUT). The paper also reports on two controlled experiments performed with master's and bachelor's students. The goal of the experiments is to evaluate the quality of identifiers and comments (in terms of their consistency with high-level artifacts) in the source code produced when using or not using COCONUT. The achieved results confirm our conjecture that providing the developers with similarity between code and high-level artifacts helps to improve the quality of source code lexicon. This indicates the potential usefulness of COCONUT as a feature for software development environments.

...read moreread less

Proceedings Article•DOI•

Clustering Support for Static Concept Location in Source Code

[...]

Giuseppe Scanniello¹, Andrian Marcus²•Institutions (2)

University of Basilicata¹, Wayne State University²

22 Jun 2011

TL;DR: A novel static concept location technique is proposed, which leverages both the textual information present in the code and the structural dependencies between source code elements, based on combining both structural and textual data.

...read moreread less

Abstract: One of the most common comprehension activities undertaken by developers is concept location in source code. In the context of software change, concept location means finding locations in source code where changes are to be made in response to a modification request. Static techniques for concept location usually rely on searching the source code using textual information or on navigating the dependencies among software elements. In this paper we propose a novel static concept location technique, which leverages both the textual information present in the code and the structural dependencies between source code elements. The technique employs a textual search in that source code, which is clustered using the Border Flow algorithm, based on combining both structural and textual data. We evaluated the technique against a text search based baseline approach using data on almost 200 changes from five software systems. The results indicate that the new approach outperforms the baseline and that improvements are still possible.

...read moreread less

Proceedings Article•

Context-based online configuration-error detection

[...]

Ding Yuan¹, Yinglian Xie², Rina Panigrahy², Junfeng Yang³, Chad Verbowski², Arunvijay Kumar² - Show less +2 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Microsoft², Columbia University³

15 Jun 2011

TL;DR: CODE is a tool that automatically detects software configuration errors based on identifying invariant configuration access rules that predict what access events follow what contexts and can sift through a voluminous number of events and detect deviant program executions.

...read moreread less

Abstract: Software failures due to configuration errors are commonplace as computer systems continue to grow larger and more complex. Troubleshooting these configuration errors is a major administration cost, especially in server clusters where problems often go undetected without user interference. This paper presents CODE-a tool that automatically detects software configuration errors. Our approach is based on identifying invariant configuration access rules that predict what access events follow what contexts. It requires no source code, application-specific semantics, or heavyweight program analysis. Using these rules, CODE can sift through a voluminous number of events and detect deviant program executions. This is in contrast to previous approaches that focus on only diagnosis. In our experiments, CODE successfully detected a real configuration error in one of our deployment machines, in addition to 20 user-reported errors that we reproduced in our test environment. When analyzing month-long event logs from both user desktops and production servers, CODE yielded a low false positive rate. The efficiency of CODE makes it feasible to be deployed as a practical management tool with low overhead.

...read moreread less

Book Chapter•DOI•

Interactive synthesis of code snippets

[...]

Tihomir Gvero¹, Viktor Kuncak¹, Ruzica Piskac¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

14 Jul 2011

TL;DR: A tool that applies theorem proving technology to synthesize code fragments that use given library functions that is useful for synthesizing code fragments for common programming tasks and a good platform for exploring software synthesis techniques is described.

...read moreread less

Abstract: We describe a tool that applies theorem proving technology to synthesize code fragments that use given library functions. To determine candidate code fragments, our approach takes into account polymorphic type constraints as well as test cases. Our tool interactively displays a ranked list of suggested code fragments that are appropriate for the current program point. We have found our system to be useful for synthesizing code fragments for common programming tasks, and we believe it is a good platform for exploring software synthesis techniques.

...read moreread less

Proceedings Article•DOI•

Active code completion

[...]

Cyrus Omar¹, YoungSeok Yoon¹, Thomas D. LaToza¹, Brad A. Myers¹•Institutions (1)

Carnegie Mellon University¹

10 Nov 2011

TL;DR: Active code completion is described, an architecture that allows library developers to introduce interactive and highly-specialized code generation interfaces, called palettes, directly into the editor, and one such system is designed, named Graphite, for the Eclipse Java development environment.

...read moreread less

Abstract: Software developers today make heavy use of the code completion features available in modern code editors [1] By navigating and selecting from a floating menu containing the names of variables, fields, methods, types and code snippets, a developer can avoid many common spelling and logic errors, avoid unnecessary keystrokes and explore unfamiliar APIs without leaving the editor window To ensure that the items featured in this menu are relevant, the editor conducts a static analysis of the surrounding code context

...read moreread less

Proceedings Article•DOI•

Fast and accurate source-level simulation of software timing considering complex code optimizations

[...]

Stefan Stattelmann¹, Oliver Bringmann¹, Wolfgang Rosenstiel¹•Institutions (1)

Forschungszentrum Informatik¹

05 Jun 2011

TL;DR: Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.

...read moreread less

Abstract: This paper presents an approach for accurately estimating the execution time of parallel software components in complex embedded systems. Timing annotations obtained from highly optimized binary code are added to the source code of software components which is then integrated into a SystemC transaction-level simulation. This approach allows a fast evaluation of software execution times while being as accurate as conventional instruction set simulators. By simulating binary-level control flow in parallel to the original functionality of the software, even compiler optimizations heavily modifying the structure of the generated code can be modeled accurately. Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.

...read moreread less

Proceedings Article•DOI•

Code-based automated program fixing

[...]

Yu Pei¹, Yi Wei¹, Carlo A. Furia¹, Martin Nordio¹, Bertrand Meyer¹ - Show less +1 more•Institutions (1)

ETH Zurich¹

06 Nov 2011

TL;DR: Applications of AutoFix-E2 to general-purpose software, such as a library to manipulate documents, show that the approach provides an improvement over previous techniques, in particular purely model-based approaches.

...read moreread less

Abstract: Initial research in automated program fixing has generally limited itself to specific areas, such as data structure classes with carefully designed interfaces, and relied on simple approaches. To provide high-quality fix suggestions in a broad area of applicability, the present work relies on the presence of contracts in the code, and on the availability of static and dynamic analyses to gather evidence on the values taken by expressions derived from the code. The ideas have been built into the AutoFix-E2 automatic fix generator. Applications of AutoFix-E2 to general-purpose software, such as a library to manipulate documents, show that the approach provides an improvement over previous techniques, in particular purely model-based approaches.

...read moreread less

Proceedings Article•DOI•

Path- and index-sensitive string analysis based on monadic second-order logic

[...]

Takaaki Tateishi¹, Marco Pistoia¹, Omer Tripp²•Institutions (2)

IBM¹, Tel Aviv University²

17 Jul 2011

TL;DR: The string-analysis algorithm is implemented, and used to augment an industrial security analysis for Web applications by automatically detecting and verifying sanitizers---methods that eliminate malicious patterns from untrusted strings, making those strings safe to use in security-sensitive operations.

...read moreread less

Abstract: We propose a novel technique for statically verifying the strings generated by a program. The verification is conducted by encoding the program in Monadic Second-Order Logic (M2L). We use M2L to describe constraints among program variables and to abstract built-in string operations. Once we encode a program in M2L, a theorem prover for M2L, such as MONA, can automatically check if a string generated by the program satisfies a given specification, and if not, exhibit a counterexample. With this approach, we can naturally encode relationships among strings, accounting also for cases in which a program manipulates strings using indices. In addition, our string analysis is path sensitive in that it accounts for the effects of string and Boolean comparisons, as well as regular-expression matches.We have implemented our string-analysis algorithm, and used it to augment an industrial security analysis for Web applications by automatically detecting and verifying sanitizers---methods that eliminate malicious patterns from untrusted strings, making those strings safe to use in security-sensitive operations. On the 8 benchmarks we analyzed, our string analyzer discovered 128 previously unknown sanitizers, compared to 71 sanitizers detected by a previously presented string analysis.

...read moreread less

Proceedings Article•DOI•

Using background colors to support program comprehension in software product lines

[...]

Janet Feigenspan¹, Michael Schulze¹, Maria Papendieck¹, Christian Kästner², Raimund Dachselt¹, Veit Köppen¹, Mathias Frisch¹ - Show less +3 more•Institutions (2)

Otto-von-Guericke University Magdeburg¹, University of Marburg²

11 Apr 2011

TL;DR: A controlled experiment with a large real-world SPL with over 160,000 lines of code and 340 features shows that background colors can improve program comprehension in large SPLs.

...read moreread less

Abstract: Background: Software product line engineering provides an effective mechanism to implement variable software. However, the usage of preprocessors, which is typical in industry, is heavily criticized, because it often leads to obfuscated code. Using background colors to support comprehensibility has shown effective, however, scalability to large software product lines (SPLs) is questionable. Aim: Our goal is to implement and evaluate scalable usage of background colors for industrial-sized SPLs. Method: We designed and implemented scalable concepts in a tool called FeatureCommander. To evaluate its effectiveness, we conducted a controlled experiment with a large real-world SPL with over 160,000 lines of code and 340 features. We used a within-subjects design with treatments colors and no colors. We compared correctness and response time of tasks for both treatments. Results: For certain kinds of tasks, background colors improve program comprehension. Furthermore, subjects generally favor background colors. Conclusion: We show that background colors can improve program comprehension in large SPLs. Based on these encouraging results, we will continue our work improving program comprehension in large SPLs.

...read moreread less

Proceedings Article•DOI•

Historage: fine-grained version control system for Java

[...]

Hideaki Hata¹, Osamu Mizuno², Tohru Kikuno¹•Institutions (2)

Osaka University¹, Kyoto Institute of Technology²

05 Sep 2011

TL;DR: A tool named Historage is presented that can provide entire histories of fine grained entities in Java, such as methods, constructors, fields, etc, to trace entity histories including renaming changes.

...read moreread less

Abstract: Software systems are changed continuously for adapting to the environment, correcting faults, improving performance, and so on. For in-depth analysis related to software evolution, it is informative to obtain the histories of fine-grained source code entities. This paper presents a tool named Historage that can provide entire histories of fine grained entities in Java, such as methods, constructors, fields, etc. A characteristic of Historage is the ability of tracing entity histories including renaming changes. We applied our technique to five open source software projects to quantitatively evaluate the renaming change identification.

...read moreread less

Proceedings Article•DOI•

An initial study on the use of execution complexity metrics as indicators of software vulnerabilities

[...]

Yonghee Shin¹, Laurie Williams²•Institutions (2)

DePaul University¹, North Carolina State University²

22 May 2011

TL;DR: The results indicate that execution complexity metrics are better indicators of vulnerable code locations than the most commonly-used static complexity metric, lines of source code.

...read moreread less

Abstract: Allocating code inspection and testing resources to the most problematic code areas is important to reduce development time and cost. While complexity metrics collected statically from software artifacts are known to be helpful in finding vulnerable code locations, some complex code is rarely executed in practice and has less chance of its vulnerabilities being detected. To augment the use of static complexity metrics, this study examines execution complexity metrics that are collected during code execution as indicators of vulnerable code locations. We conducted case studies on two large size, widely-used open source projects, the Mozilla Firefox web browser and the Wireshark network protocol analyzer. Our results indicate that execution complexity metrics are better indicators of vulnerable code locations than the most commonly-used static complexity metric, lines of source code. The ability of execution complexity metrics to discriminate vulnerable code locations from neutral code locations and to predict vulnerable code locations vary depending on projects. However, the vulnerability prediction models using execution complexity metrics are superior to the models using static complexity metrics in reducing inspection effort.

...read moreread less

Journal Article•DOI•

Probabilistic, modular and scalable inference of typestate specifications

[...]

Nels E. Beckman¹, Aditya V. Nori²•Institutions (2)

Carnegie Mellon University¹, Microsoft²

04 Jun 2011

TL;DR: The results for the large benchmark show that ANEK can quickly infer specifications that are both accurate and qualitatively similar to those written by hand, and at 5% of the time taken to manually discover and hand-code the specifications.

...read moreread less

Abstract: Static analysis tools aim to find bugs in software that correspond to violations of specifications. Unfortunately, for large and complex software, these specifications are usually either unavailable or sophisticated, and hard to write.This paper presents ANEK, a tool and accompanying methodology for inferring specifications useful for modular typestate checking of programs. In particular, these specifications consist of pre and postconditions along with aliasing annotations known as access permissions. A novel feature of ANEK is that it can generate program specifications even when the code under analysis gives rise to conflicting constraints, a situation that typically occurs when there are bugs. The design of ANEK also makes it easy to add heuristic constraints that encode intuitions gleaned from several years of experience writing such specifications, and this allows it to infer specifications that are better in a subjective sense. The ANEK algorithm is based on a modular analysis that makes it fast and scalable, while producing reliable specifications. All of these features are enabled by its underlying probabilistic analysis that produces specifications that are very likely.Our implementation of ANEK infers access permissions specifications used by the PLURAL [5] modular typestate checker for Java programs. We have run ANEK on a number of Java benchmark programs, including one large open-source program(approximately 38K lines of code), to infer specifications that were then checked using PLURAL. The results for the large benchmark show that ANEK can quickly infer specifications that are both accurate and qualitatively similar to those written by hand, and at 5% of the time taken to manually discover and hand-code the specifications.

...read moreread less

Patent•

Code clone notification and architectural change visualization

[...]

Yingnong Dang¹, Sadi Khan¹, Dongmei Zhang¹, Weipeng Liu¹, Song Ge¹, Gong Cheng¹ - Show less +2 more•Institutions (1)

Microsoft¹

20 Dec 2011

TL;DR: A code verification system is described in this paper that provides augmented code review with code clone analysis and visualization to help software developers automatically identify similar instances of the same code and to visualize differences in versions of software code over time.

...read moreread less

Abstract: A code verification system is described herein that provides augmented code review with code clone analysis and visualization to help software developers automatically identify similar instances of the same code and to visualize differences in versions of software code over time. The system uses code clone search technology to identify code clones and to present the user with information about similar code as the developer makes changes. The system may provide automated notification to the developer or to other teams as changes are made to code segments with one or more related clones. The code verification system also helps the developer to understand architectural evolution of a body of software code. The code verification system provides an analysis component for determining architectural differences based on the code clone detection result between the two versions of the software code base. The code verification system also provides a user interface component for displaying identified differences to developers and others involved with the software development process in intuitive and useful ways.

...read moreread less

Proceedings Article•DOI•

Test-Driving Static Analysis Tools in Search of C Code Vulnerabilities

[...]

George Chatzieleftheriou¹, Panagiotis Katsaros¹•Institutions (1)

Aristotle University of Thessaloniki¹

18 Jul 2011

TL;DR: Four open source and two commercial tools are compared in terms of their effectiveness and efficiency of their detection capability and a test suite implementing the discussed requirements for frequent defects selected from public catalogues is introduced.

...read moreread less

Abstract: Recently, a number of tools for automated code scanning came in the limelight. Due to the significant costs associated with incorporating such a tool in the software lifecycle, it is important to know what defects are detected and how accurate and efficient the analysis is. We focus specifically on popular static analysis tools for C code defects. Existing benchmarks include the actual defects in open source programs, but they lack systematic coverage of possible code defects and the coding complexities in which they arise. We introduce a test suite implementing the discussed requirements for frequent defects selected from public catalogues. Four open source and two commercial tools are compared in terms of their effectiveness and efficiency of their detection capability. A wide range of C constructs is taken into account and appropriate metrics are computed, which show how the tools balance inherent analysis tradeoffs and efficiency. The results are useful for identifying the appropriate tool, in terms of cost-effectiveness, while the proposed methodology and test suite may be reused.

...read moreread less

Proceedings Article•DOI•

Detecting anomalies in the order of equally-typed method arguments

[...]

Michael Pradel¹, Thomas R. Gross¹•Institutions (1)

ETH Zurich¹

17 Jul 2011

TL;DR: An automated, static program analysis is presented that detects problems without any input except for the source code of a program by leveraging the observation that programmer-given identifier names convey information about the semantics of arguments, which can be used to assign equally-typed arguments to their expected position.

...read moreread less

Abstract: In statically-typed programming languages, the compiler ensures that method arguments are passed in the expected order by checking the type of each argument. However, calls to methods with multiple equally-typed parameters slip through this check. The uncertainty about the correct argument order of equally-typed arguments can cause various problems, for example, if a programmer accidentally reverses two arguments. We present an automated, static program analysis that detects such problems without any input except for the source code of a program. The analysis leverages the observation that programmer-given identifier names convey information about the semantics of arguments, which can be used to assign equally-typed arguments to their expected position. We evaluate the approach with a large corpus of Java programs and show that our analysis finds relevant anomalies with a precision of 76%.

...read moreread less

Collapse