scispace - formally typeset
Search or ask a question

Showing papers on "Static program analysis published in 2011"


Proceedings ArticleDOI
21 May 2011
TL;DR: For the first time, TamiFlex enables sound static whole-program analyses on DaCapo and significantly improves code coverage of the static analyses, while for the former the approach even appears complete: the inserted runtime checks issue no warning.
Abstract: Static program analyses and transformations for Java face many problems when analyzing programs that use reflection or custom class loaders: How can a static analysis know which reflective calls the program will execute? How can it get hold of classes that the program loads from remote locations or even generates on the fly? And if the analysis transforms classes, how can these classes be re-inserted into a program that uses custom class loaders? In this paper, we present TamiFlex, a tool chain that offers a partial but often effective solution to these problems. With TamiFlex, programmers can use existing static-analysis tools to produce results that are sound at least with respect to a set of recorded program runs. TamiFlex inserts runtime checks into the program that warn the user in case the program executes reflective calls that the analysis did not take into account. TamiFlex further allows programmers to re-insert offline-transformed classes into a program. We evaluate TamiFlex in two scenarios: benchmarking with the DaCapo benchmark suite and analysing large-scale interactive applications. For the latter, TamiFlex significantly improves code coverage of the static analyses, while for the former our approach even appears complete: the inserted runtime checks issue no warning. Hence, for the first time, TamiFlex enables sound static whole-program analyses on DaCapo. During this process, TamiFlex usually incurs less than 10% runtime overhead.

239 citations


Book ChapterDOI
26 Mar 2011
TL;DR: This work proposes a portable partitioning scheme for OpenCL programs on heterogeneous CPU-GPU systems based on a purely static approach based on predictive modelling and program features and achieves a speedup over a suite of 47 benchmarks.
Abstract: Heterogeneous multi-core platforms are increasingly prevalent due to their perceived superior performance over homogeneous systems. The best performance, however, can only be achieved if tasks are accurately mapped to the right processors. OpenCL programs can be partitioned to take advantage of all the available processors in a system. However, finding the best partitioning for any heterogeneous system is difficult and depends on the hardware and software implementation. We propose a portable partitioning scheme for OpenCL programs on heterogeneous CPU-GPU systems. We develop a purely static approach based on predictive modelling and program features. When evaluated over a suite of 47 benchmarks, our model achieves a speedup of 1.57 over a state-of-the-art dynamic run-time approach, a speedup of 3.02 over a purely multi-core approach and 1.55 over the performance achieved by using just the GPU.

220 citations


Proceedings ArticleDOI
21 Mar 2011
TL;DR: This paper represents an SPL in a form where conventional static program analysis techniques can be applied to find irrelevant features for a test, and uses this information to reduce the combinatorial number of SPL programs to examine.
Abstract: A Software Product Line (SPL) is a family of programs where each program is defined by a unique combination of features. Testing or checking properties of an SPL is hard as it may require the examination of a combinatorial number of programs. In reality, however, features are often irrelevant for a given test - they augment, but do not change, existing behavior, making many feature combinations unnecessary as far as testing is concerned. In this paper we show how to reduce the amount of effort in testing an SPL. We represent an SPL in a form where conventional static program analysis techniques can be applied to find irrelevant features for a test. We use this information to reduce the combinatorial number of SPL programs to examine.

130 citations


Proceedings ArticleDOI
17 Oct 2011
TL;DR: This paper proposes a different approach to the problem that focuses on identifying instructions that affect the observable behavior of the obfuscated code, and aims to complement existing techniques by broadening the domain of obfuscated programs eligible for automated analysis.
Abstract: When new malware are discovered, it is important for researchers to analyze and understand them as quickly as possible. This task has been made more difficult in recent years as researchers have seen an increasing use of virtualization-obfuscated malware code. These programs are difficult to comprehend and reverse engineer, since they are resistant to both static and dynamic analysis techniques. Current approaches to dealing with such code first reverse-engineer the byte code interpreter, then use this to work out the logic of the byte code program. This outside-in approach produces good results when the structure of the interpreter is known, but cannot be applied to all cases. This paper proposes a different approach to the problem that focuses on identifying instructions that affect the observable behavior of the obfuscated code. This inside-out approach requires fewer assumptions, and aims to complement existing techniques by broadening the domain of obfuscated programs eligible for automated analysis. Results from a prototype tool on real-world malicious code are encouraging.

127 citations


Journal ArticleDOI
TL;DR: This paper proposes three views (realized by a tool called TeMo) that combine information from a software project’s versioning system, the size of the various artifacts and the test coverage reports that could recognize different co-evolution scenarios and make relevant observations for both developers as well as test engineers.
Abstract: Many software production processes advocate rigorous development testing alongside functional code writing, which implies that both test code and production code should co-evolve. To gain insight in the nature of this co-evolution, this paper proposes three views (realized by a tool called TeMo) that combine information from a software project's versioning system, the size of the various artifacts and the test coverage reports. We validate these views against two open source and one industrial software project and evaluate our results both with the help of log messages, code inspections and the original developers of the software system. With these views we could recognize different co-evolution scenarios (i.e., synchronous and phased) and make relevant observations for both developers as well as test engineers.

122 citations


Proceedings ArticleDOI
25 Sep 2011
TL;DR: The results show that the external interface cohesion metric exhibits the strongest correlation with the number of source code changes and improves the performance of prediction models to classify Java interfaces into change-prone and not change- prone.
Abstract: Recent empirical studies have investigated the use of source code metrics to predict the change- and defect-proneness of source code files and classes. While results showed strong correlations and good predictive power of these metrics, they do not distinguish between interface, abstract or concrete classes. In particular, interfaces declare contracts that are meant to remain stable during the evolution of a software system while the implementation in concrete classes is more likely to change. This paper aims at investigating to which extent the existing source code metrics can be used for predicting change-prone Java interfaces. We empirically investigate the correlation between metrics and the number of fine-grained source code changes in interfaces of ten Java open-source systems. Then, we evaluate the metrics to calculate models for predicting change-prone Java interfaces. Our results show that the external interface cohesion metric exhibits the strongest correlation with the number of source code changes. This metric also improves the performance of prediction models to classify Java interfaces into change-prone and not change-prone.

121 citations


Proceedings ArticleDOI
21 May 2011
TL;DR: The experiences in designing a static change impact analysis framework for large and evolving industrial software systems are shared and the framework is implemented as a tool called Imp and applied on an industrial codebase with over a million lines of C/ C++ code with promising empirical results.
Abstract: Change impact analysis, i.e., knowing the potential consequences of a software change, is critical for the risk analysis, developer effort estimation, and regression testing of evolving software. Static program slicing is an attractive option for enabling routine change impact analysis for newly committed changesets during daily software build. For small programs with a few thousand lines of code, static program slicing scales well and can assist precise change impact analysis. However, as we demonstrate in this paper, static program slicing faces unique challenges when applied routinely on large and evolving industrial software systems. Despite recent advances in static program slicing, to our knowledge, there have been no studies of static change impact analysis applied on large and evolving industrial software systems. In this paper, we share our experiences in designing a static change impact analysis framework for such software systems. We have implemented our framework as a tool called Imp and have applied Imp on an industrial codebase with over a million lines of C/ C++ code with promising empirical results.

119 citations


Proceedings ArticleDOI
21 May 2011
TL;DR: This paper presents a series of experiments using different machine learning algorithms with a dataset from the Eclipse platform to empirically evaluate the performance of SCC and LM and shows that SCC outperforms LM for learning bug prediction models.
Abstract: A significant amount of research effort has been dedicated to learning prediction models that allow project managers to efficiently allocate resources to those parts of a software system that most likely are bug-prone and therefore critical. Prominent measures for building bug prediction models are product measures, e.g., complexity or process measures, such as code churn. Code churn in terms of lines modified (LM) and past changes turned out to be significant indicators of bugs. However, these measures are rather imprecise and do not reflect all the detailed changes of particular source code entities during maintenance activities. In this paper, we explore the advantage of using fine-grained source code changes (SCC) for bug prediction. SCC captures the exact code changes and their semantics down to statement level. We present a series of experiments using different machine learning algorithms with a dataset from the Eclipse platform to empirically evaluate the performance of SCC and LM. The results show that SCC outperforms LM for learning bug prediction models.

110 citations


Proceedings ArticleDOI
17 Jul 2011
TL;DR: The proposed technique splits the generated path conditions into constraints that can be solved by a decision procedure and complex non-linear constraints with uninterpreted functions to represent external library calls to enable classical symbolic execution to cover paths that other dynamic symbolic execution approaches cannot cover.
Abstract: Symbolic execution is a powerful static program analysis technique that has been used for the automated generation of test inputs. Directed Automated Random Testing (DART) is a dynamic variant of symbolic execution that initially uses random values to execute a program and collects symbolic path conditions during the execution. These conditions are then used to produce new inputs to execute the program along different paths. It has been argued that DART can handle situations where classical static symbolic execution fails due to incompleteness in decision procedures and its inability to handle external library calls.We propose here a technique that mitigates these previous limitations of classical symbolic execution. The proposed technique splits the generated path conditions into (a) constraints that can be solved by a decision procedure and (b) complex non-linear constraints with uninterpreted functions to represent external library calls. The solutions generated from the decision procedure are used to simplify the complex constraints and the resulting path conditions are checked again for satisfiability. We also present heuristics that can further improve our technique. We show how our technique can enable classical symbolic execution to cover paths that other dynamic symbolic execution approaches cannot cover. Our method has been implemented within the Symbolic PathFinder tool and has been applied to several examples, including two from the NASA domain.

101 citations


Book ChapterDOI
18 May 2011
TL;DR: This paper introduces a novel code obfuscation scheme that applies the concept of software diversification to the control flow graph of the software to enhance its complexity and improves resistance against both static disassembling tools and dynamic reverse engineering at a reasonable performance penalty.
Abstract: The process of reverse engineering allows attackers to understand the behavior of software and extract proprietary algorithms and data structures (e.g. cryptographic keys) from it. Code obfuscation is frequently employed to mitigate this risk. However, while most of today's obfuscation methods are targeted against static reverse engineering, where the attacker analyzes the code without actually executing it, they are still insecure against dynamic analysis techniques, where the behavior of the software is inspected at runtime. In this paper, we introduce a novel code obfuscation scheme that applies the concept of software diversification to the control flow graph of the software to enhance its complexity. Our approach aims at making dynamic reverse engineering considerably harder as the information an attacker can retrieve from the analysis of a single run of the program with a certain input, is useless for understanding the program behavior on other inputs. Based on a prototype implementation we show that our approach improves resistance against both static disassembling tools and dynamic reverse engineering at a reasonable performance penalty.

87 citations


Proceedings ArticleDOI
28 Mar 2011
TL;DR: This paper develops a technique that extracts client-side JavaScript code from server-side scripts that can effectively identify real bugs and proposes a static program analysis to automatically detect such bugs in web applications.
Abstract: Ajax becomes more and more important for web applications that care about client side user experience. It allows sending requests asynchronously, without blocking clients from continuing execution. Callback functions are only executed upon receiving the responses. While such mechanism makes browsing a smooth experience, it may cause severe problems in the presence of unexpected network latency, due to the non-determinism of asynchronism. In this paper, we demonstrate the possible problems caused by the asynchronism and propose a static program analysis to automatically detect such bugs in web applications. As client side Ajax code is often wrapped in server-side scripts, we also develop a technique that extracts client-side JavaScript code from server-side scripts. We evaluate our technique on a number of real-world web applications. Our results show that it can effectively identify real bugs. We also discuss possible ways to avoid such bugs.

Proceedings ArticleDOI
25 Sep 2011
TL;DR: Building on a successful approach to identifiers, an implementation of an expansion algorithm is presented that finds that up to 66% of identifiers are correctly expanded, which is within about 20% of the current system's best-case performance.
Abstract: Maintaining modern software requires significant tool support. Effective tools exploit a variety of information and techniques to aid a software maintainer. One area of recent interest in tool development exploits the natural language information found in source code. Such Information Retrieval (IR) based tools compliment traditional static analysis tools and have tackled problems, such as feature location, that otherwise require considerable human effort. To reap the full benefit of IR-based techniques, the language used across all software artifacts (e.g., requirements, design, change requests, tests, and source code) must be consistent. Unfortunately, there is a significant proportion of invented vocabulary in source code. Vocabulary normalization aligns the vocabulary found in the source code with that found in other software artifacts. Most existing work related to normalization has focused on splitting an identifier into its constituent parts. The next step is to expand each part into a (dictionary) word that matches the vocabulary used in other software artifacts. Building on a successful approach to splitting identifiers, an implementation of an expansion algorithm is presented. Experiments on two systems find that up to 66% of identifiers are correctly expanded, which is within about 20% of the current system's best-case performance. Not only is this performance comparable to previous techniques, but the result is achieved in the absence of special purpose rules and not limited to restricted syntactic contexts. Results from these experiments also show the impact that varying levels of documentation (including both internal documentation such as the requirements and design, and external, or user-level, documentation) have on the algorithm's performance.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper presents a novel search technique that uses information such as the position of the query word and its semantic role to calculate relevance and shows that this approach is more consistently effective than three other state of the art search techniques.
Abstract: As software continues to grow, locating code for maintenance tasks becomes increasingly difficult. Software search tools help developers find source code relevant to their maintenance tasks. One major challenge to successful search tools is locating relevant code when the user's query contains words with multiple meanings or words that occur frequently throughout the program. Traditional search techniques, which treat each word individually, are unable to distinguish relevant and irrelevant methods under these conditions. In this paper, we present a novel search technique that uses information such as the position of the query word and its semantic role to calculate relevance. Our evaluation shows that this approach is more consistently effective than three other state of the art search techniques.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: The aim of this paper is to describe the experience on using different tools for code smell detection, and outline the main differences among them and the different results obtained.
Abstract: Detecting code smells in the code and consequently applying the right refactoring steps when necessary is very important to improve the quality of the code. Different tools have been proposed for code smell detection, each one characterized by particular features. The aim of this paper is to describe our experience on using different tools for code smell detection. We outline the main differences among them and the different results we obtained.

Proceedings ArticleDOI
09 Sep 2011
TL;DR: This study looks at different kinds of coupling concepts between source code entities, including structural dependencies, fan-out similarity, evolutionary coupling, code ownership, code clones, and semantic similarity, and provides insights on how to support developers to modularize software systems.
Abstract: Software systems are modularized to make their inherent complexity manageable. While there exists a set of well-known principles that may guide software engineers to design the modules of a software system, we do not know which principles are followed in practice. In a study based on 16 open source projects, we look at different kinds of coupling concepts between source code entities, including structural dependencies, fan-out similarity, evolutionary coupling, code ownership, code clones, and semantic similarity. The congruence between these coupling concepts and the modularization of the system hints at the modularity principles used in practice. Furthermore, the results provide insights on how to support developers to modularize software systems.

Journal ArticleDOI
TL;DR: The achieved results confirm the conjecture that providing the developers with similarity between code and high-level artifacts helps to improve the quality of source code lexicon and indicates the potential usefulness of COCONUT as a feature for software development environments.
Abstract: The paper presents an approach helping developers to maintain source code identifiers and comments consistent with high-level artifacts. Specifically, the approach computes and shows the textual similarity between source code and related high-level artifacts. Our conjecture is that developers are induced to improve the source code lexicon, i.e., terms used in identifiers or comments, if the software development environment provides information about the textual similarity between the source code under development and the related high-level artifacts. The proposed approach also recommends candidate identifiers built from high-level artifacts related to the source code under development and has been implemented as an Eclipse plug-in, called COde Comprehension Nurturant Using Traceability (COCONUT). The paper also reports on two controlled experiments performed with master's and bachelor's students. The goal of the experiments is to evaluate the quality of identifiers and comments (in terms of their consistency with high-level artifacts) in the source code produced when using or not using COCONUT. The achieved results confirm our conjecture that providing the developers with similarity between code and high-level artifacts helps to improve the quality of source code lexicon. This indicates the potential usefulness of COCONUT as a feature for software development environments.

Proceedings ArticleDOI
22 Jun 2011
TL;DR: A novel static concept location technique is proposed, which leverages both the textual information present in the code and the structural dependencies between source code elements, based on combining both structural and textual data.
Abstract: One of the most common comprehension activities undertaken by developers is concept location in source code. In the context of software change, concept location means finding locations in source code where changes are to be made in response to a modification request. Static techniques for concept location usually rely on searching the source code using textual information or on navigating the dependencies among software elements. In this paper we propose a novel static concept location technique, which leverages both the textual information present in the code and the structural dependencies between source code elements. The technique employs a textual search in that source code, which is clustered using the Border Flow algorithm, based on combining both structural and textual data. We evaluated the technique against a text search based baseline approach using data on almost 200 changes from five software systems. The results indicate that the new approach outperforms the baseline and that improvements are still possible.

Proceedings Article
15 Jun 2011
TL;DR: CODE is a tool that automatically detects software configuration errors based on identifying invariant configuration access rules that predict what access events follow what contexts and can sift through a voluminous number of events and detect deviant program executions.
Abstract: Software failures due to configuration errors are commonplace as computer systems continue to grow larger and more complex. Troubleshooting these configuration errors is a major administration cost, especially in server clusters where problems often go undetected without user interference. This paper presents CODE-a tool that automatically detects software configuration errors. Our approach is based on identifying invariant configuration access rules that predict what access events follow what contexts. It requires no source code, application-specific semantics, or heavyweight program analysis. Using these rules, CODE can sift through a voluminous number of events and detect deviant program executions. This is in contrast to previous approaches that focus on only diagnosis. In our experiments, CODE successfully detected a real configuration error in one of our deployment machines, in addition to 20 user-reported errors that we reproduced in our test environment. When analyzing month-long event logs from both user desktops and production servers, CODE yielded a low false positive rate. The efficiency of CODE makes it feasible to be deployed as a practical management tool with low overhead.

Book ChapterDOI
14 Jul 2011
TL;DR: A tool that applies theorem proving technology to synthesize code fragments that use given library functions that is useful for synthesizing code fragments for common programming tasks and a good platform for exploring software synthesis techniques is described.
Abstract: We describe a tool that applies theorem proving technology to synthesize code fragments that use given library functions. To determine candidate code fragments, our approach takes into account polymorphic type constraints as well as test cases. Our tool interactively displays a ranked list of suggested code fragments that are appropriate for the current program point. We have found our system to be useful for synthesizing code fragments for common programming tasks, and we believe it is a good platform for exploring software synthesis techniques.

Proceedings ArticleDOI
10 Nov 2011
TL;DR: Active code completion is described, an architecture that allows library developers to introduce interactive and highly-specialized code generation interfaces, called palettes, directly into the editor, and one such system is designed, named Graphite, for the Eclipse Java development environment.
Abstract: Software developers today make heavy use of the code completion features available in modern code editors [1] By navigating and selecting from a floating menu containing the names of variables, fields, methods, types and code snippets, a developer can avoid many common spelling and logic errors, avoid unnecessary keystrokes and explore unfamiliar APIs without leaving the editor window To ensure that the items featured in this menu are relevant, the editor conducts a static analysis of the surrounding code context

Proceedings ArticleDOI
05 Jun 2011
TL;DR: Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.
Abstract: This paper presents an approach for accurately estimating the execution time of parallel software components in complex embedded systems. Timing annotations obtained from highly optimized binary code are added to the source code of software components which is then integrated into a SystemC transaction-level simulation. This approach allows a fast evaluation of software execution times while being as accurate as conventional instruction set simulators. By simulating binary-level control flow in parallel to the original functionality of the software, even compiler optimizations heavily modifying the structure of the generated code can be modeled accurately. Experimental results show that the presented method produces timing estimates within the same level of accuracy as an established commercial tool for cycle-accurate instruction set simulation while being at least 20 times faster.

Proceedings ArticleDOI
Yu Pei1, Yi Wei1, Carlo A. Furia1, Martin Nordio1, Bertrand Meyer1 
06 Nov 2011
TL;DR: Applications of AutoFix-E2 to general-purpose software, such as a library to manipulate documents, show that the approach provides an improvement over previous techniques, in particular purely model-based approaches.
Abstract: Initial research in automated program fixing has generally limited itself to specific areas, such as data structure classes with carefully designed interfaces, and relied on simple approaches. To provide high-quality fix suggestions in a broad area of applicability, the present work relies on the presence of contracts in the code, and on the availability of static and dynamic analyses to gather evidence on the values taken by expressions derived from the code. The ideas have been built into the AutoFix-E2 automatic fix generator. Applications of AutoFix-E2 to general-purpose software, such as a library to manipulate documents, show that the approach provides an improvement over previous techniques, in particular purely model-based approaches.

Proceedings ArticleDOI
17 Jul 2011
TL;DR: The string-analysis algorithm is implemented, and used to augment an industrial security analysis for Web applications by automatically detecting and verifying sanitizers---methods that eliminate malicious patterns from untrusted strings, making those strings safe to use in security-sensitive operations.
Abstract: We propose a novel technique for statically verifying the strings generated by a program. The verification is conducted by encoding the program in Monadic Second-Order Logic (M2L). We use M2L to describe constraints among program variables and to abstract built-in string operations. Once we encode a program in M2L, a theorem prover for M2L, such as MONA, can automatically check if a string generated by the program satisfies a given specification, and if not, exhibit a counterexample. With this approach, we can naturally encode relationships among strings, accounting also for cases in which a program manipulates strings using indices. In addition, our string analysis is path sensitive in that it accounts for the effects of string and Boolean comparisons, as well as regular-expression matches.We have implemented our string-analysis algorithm, and used it to augment an industrial security analysis for Web applications by automatically detecting and verifying sanitizers---methods that eliminate malicious patterns from untrusted strings, making those strings safe to use in security-sensitive operations. On the 8 benchmarks we analyzed, our string analyzer discovered 128 previously unknown sanitizers, compared to 71 sanitizers detected by a previously presented string analysis.

Proceedings ArticleDOI
11 Apr 2011
TL;DR: A controlled experiment with a large real-world SPL with over 160,000 lines of code and 340 features shows that background colors can improve program comprehension in large SPLs.
Abstract: Background: Software product line engineering provides an effective mechanism to implement variable software. However, the usage of preprocessors, which is typical in industry, is heavily criticized, because it often leads to obfuscated code. Using background colors to support comprehensibility has shown effective, however, scalability to large software product lines (SPLs) is questionable. Aim: Our goal is to implement and evaluate scalable usage of background colors for industrial-sized SPLs. Method: We designed and implemented scalable concepts in a tool called FeatureCommander. To evaluate its effectiveness, we conducted a controlled experiment with a large real-world SPL with over 160,000 lines of code and 340 features. We used a within-subjects design with treatments colors and no colors. We compared correctness and response time of tasks for both treatments. Results: For certain kinds of tasks, background colors improve program comprehension. Furthermore, subjects generally favor background colors. Conclusion: We show that background colors can improve program comprehension in large SPLs. Based on these encouraging results, we will continue our work improving program comprehension in large SPLs.

Proceedings ArticleDOI
05 Sep 2011
TL;DR: A tool named Historage is presented that can provide entire histories of fine grained entities in Java, such as methods, constructors, fields, etc, to trace entity histories including renaming changes.
Abstract: Software systems are changed continuously for adapting to the environment, correcting faults, improving performance, and so on. For in-depth analysis related to software evolution, it is informative to obtain the histories of fine-grained source code entities. This paper presents a tool named Historage that can provide entire histories of fine grained entities in Java, such as methods, constructors, fields, etc. A characteristic of Historage is the ability of tracing entity histories including renaming changes. We applied our technique to five open source software projects to quantitatively evaluate the renaming change identification.

Proceedings ArticleDOI
22 May 2011
TL;DR: The results indicate that execution complexity metrics are better indicators of vulnerable code locations than the most commonly-used static complexity metric, lines of source code.
Abstract: Allocating code inspection and testing resources to the most problematic code areas is important to reduce development time and cost. While complexity metrics collected statically from software artifacts are known to be helpful in finding vulnerable code locations, some complex code is rarely executed in practice and has less chance of its vulnerabilities being detected. To augment the use of static complexity metrics, this study examines execution complexity metrics that are collected during code execution as indicators of vulnerable code locations. We conducted case studies on two large size, widely-used open source projects, the Mozilla Firefox web browser and the Wireshark network protocol analyzer. Our results indicate that execution complexity metrics are better indicators of vulnerable code locations than the most commonly-used static complexity metric, lines of source code. The ability of execution complexity metrics to discriminate vulnerable code locations from neutral code locations and to predict vulnerable code locations vary depending on projects. However, the vulnerability prediction models using execution complexity metrics are superior to the models using static complexity metrics in reducing inspection effort.

Journal ArticleDOI
04 Jun 2011
TL;DR: The results for the large benchmark show that ANEK can quickly infer specifications that are both accurate and qualitatively similar to those written by hand, and at 5% of the time taken to manually discover and hand-code the specifications.
Abstract: Static analysis tools aim to find bugs in software that correspond to violations of specifications. Unfortunately, for large and complex software, these specifications are usually either unavailable or sophisticated, and hard to write.This paper presents ANEK, a tool and accompanying methodology for inferring specifications useful for modular typestate checking of programs. In particular, these specifications consist of pre and postconditions along with aliasing annotations known as access permissions. A novel feature of ANEK is that it can generate program specifications even when the code under analysis gives rise to conflicting constraints, a situation that typically occurs when there are bugs. The design of ANEK also makes it easy to add heuristic constraints that encode intuitions gleaned from several years of experience writing such specifications, and this allows it to infer specifications that are better in a subjective sense. The ANEK algorithm is based on a modular analysis that makes it fast and scalable, while producing reliable specifications. All of these features are enabled by its underlying probabilistic analysis that produces specifications that are very likely.Our implementation of ANEK infers access permissions specifications used by the PLURAL [5] modular typestate checker for Java programs. We have run ANEK on a number of Java benchmark programs, including one large open-source program(approximately 38K lines of code), to infer specifications that were then checked using PLURAL. The results for the large benchmark show that ANEK can quickly infer specifications that are both accurate and qualitatively similar to those written by hand, and at 5% of the time taken to manually discover and hand-code the specifications.

Patent
Yingnong Dang1, Sadi Khan1, Dongmei Zhang1, Weipeng Liu1, Song Ge1, Gong Cheng1 
20 Dec 2011
TL;DR: A code verification system is described in this paper that provides augmented code review with code clone analysis and visualization to help software developers automatically identify similar instances of the same code and to visualize differences in versions of software code over time.
Abstract: A code verification system is described herein that provides augmented code review with code clone analysis and visualization to help software developers automatically identify similar instances of the same code and to visualize differences in versions of software code over time. The system uses code clone search technology to identify code clones and to present the user with information about similar code as the developer makes changes. The system may provide automated notification to the developer or to other teams as changes are made to code segments with one or more related clones. The code verification system also helps the developer to understand architectural evolution of a body of software code. The code verification system provides an analysis component for determining architectural differences based on the code clone detection result between the two versions of the software code base. The code verification system also provides a user interface component for displaying identified differences to developers and others involved with the software development process in intuitive and useful ways.

Proceedings ArticleDOI
18 Jul 2011
TL;DR: Four open source and two commercial tools are compared in terms of their effectiveness and efficiency of their detection capability and a test suite implementing the discussed requirements for frequent defects selected from public catalogues is introduced.
Abstract: Recently, a number of tools for automated code scanning came in the limelight. Due to the significant costs associated with incorporating such a tool in the software lifecycle, it is important to know what defects are detected and how accurate and efficient the analysis is. We focus specifically on popular static analysis tools for C code defects. Existing benchmarks include the actual defects in open source programs, but they lack systematic coverage of possible code defects and the coding complexities in which they arise. We introduce a test suite implementing the discussed requirements for frequent defects selected from public catalogues. Four open source and two commercial tools are compared in terms of their effectiveness and efficiency of their detection capability. A wide range of C constructs is taken into account and appropriate metrics are computed, which show how the tools balance inherent analysis tradeoffs and efficiency. The results are useful for identifying the appropriate tool, in terms of cost-effectiveness, while the proposed methodology and test suite may be reused.

Proceedings ArticleDOI
17 Jul 2011
TL;DR: An automated, static program analysis is presented that detects problems without any input except for the source code of a program by leveraging the observation that programmer-given identifier names convey information about the semantics of arguments, which can be used to assign equally-typed arguments to their expected position.
Abstract: In statically-typed programming languages, the compiler ensures that method arguments are passed in the expected order by checking the type of each argument. However, calls to methods with multiple equally-typed parameters slip through this check. The uncertainty about the correct argument order of equally-typed arguments can cause various problems, for example, if a programmer accidentally reverses two arguments. We present an automated, static program analysis that detects such problems without any input except for the source code of a program. The analysis leverages the observation that programmer-given identifier names convey information about the semantics of arguments, which can be used to assign equally-typed arguments to their expected position. We evaluate the approach with a large corpus of Java programs and show that our analysis finds relevant anomalies with a precision of 76%.