scispace - formally typeset
Search or ask a question

Showing papers on "Static program analysis published in 1999"


Book
22 Oct 1999
TL;DR: This book is unique in providing an overview of the four major approaches to program analysis: data flow analysis, constraint-based analysis, abstract interpretation, and type and effect systems.
Abstract: Program analysis utilizes static techniques for computing reliable information about the dynamic behavior of programs. Applications include compilers (for code improvement), software validation (for detecting errors) and transformations between data representation (for solving problems such as Y2K). This book is unique in providing an overview of the four major approaches to program analysis: data flow analysis, constraint-based analysis, abstract interpretation, and type and effect systems. The presentation illustrates the extensive similarities between the approaches, helping readers to choose the best one to utilize.

1,955 citations


Patent
Michel K. Bowman-Amuah1
31 Aug 1999
TL;DR: In this article, a system, method, and article of manufacture are provided for affording consistency in a development architecture framework as components in the framework change, and tools are also provided for managing the different versions of the program code.
Abstract: A system, method, and article of manufacture are provided for affording consistency in a development architecture framework as components in the framework change. A reference program code is provided and a plurality of sets of updated program code are received which represent different versions of the program code. The sets of the updated program code are compared with the reference program code in order to identify information relating to changes and the information is classified in relation to the changes. Tools are also provided for managing the different versions of the program code.

819 citations


Proceedings ArticleDOI
30 Aug 1999
TL;DR: This paper shows that it is possible to circumvent this hindrance by applying a language independent and visual approach, i.e. a tool that requires no parsing, yet is able to detect a significant amount of code duplication.
Abstract: Code duplication is one of the factors that severely complicates the maintenance and evolution of large software systems. Techniques for detecting duplicated code exist but rely mostly on parsers, technology that has proven to be brittle in the face of different languages and dialects. In this paper we show that is possible to circumvent this hindrance by applying a language independent and visual approach, i.e. a tool that requires no parsing, yet is able to detect a significant amount of code duplication. We validate our approach on a number of case studies, involving four different implementation languages and ranging from 256 K up to 13 Mb of source code size.

678 citations


Proceedings ArticleDOI
06 Oct 1999
TL;DR: This work confirms the importance of a proper description scheme of the entities being clustered, lists a few good coupling metrics to use and characterize the quality of different clustering algorithms, and proposes novel description schemes not directly based on the source code.
Abstract: As valuable software systems get old, reverse engineering becomes more and more important to the companies that have to maintain the code. Clustering is a key activity in reverse engineering to discover a better design of the systems or to extract significant concepts from the code. Clustering is an old activity, highly sophisticated, offering many methods to answer different needs. Although these methods have been well documented in the past, these discussions may not apply entirely to the reverse engineering domain. We study some clustering algorithms and other parameters to establish whether and why they could be used for software remodularization. We study three aspects of the clustering activity: abstract descriptions chosen for the entities to cluster; metrics computing coupling between the entities; and clustering algorithms. The experiments were conducted on three public domain systems (gcc, Linux and Mosaic) and a real world legacy system (2 million LOC). Among other things, we confirm the importance of a proper description scheme of the entities being clustered, we list a few good coupling metrics to use and characterize the quality of different clustering algorithms. We also propose novel description schemes not directly based on the source code and we advocate better formal evaluation methods for the clustering results.

313 citations


Journal ArticleDOI
TL;DR: A hierarchy of cognitive issues which should be considered during the design of a software exploration tool is described, derived through the examination of program comprehension cognitive models.

249 citations


Proceedings ArticleDOI
30 Aug 1999
TL;DR: This paper presents an environment supporting the generation of tailorable views of object-oriented systems from both static and dynamic information, based on the combination of user-defined queries.
Abstract: Recovering architectural documentation from code is crucial to maintaining and re-engineering software systems. Reverse engineering and program understanding approaches are often limited by the fact that: 1) they propose a fixed set of predefined views, and 2) they consider either purely static or purely dynamic views of the application. In this paper we present an environment supporting the generation of tailorable views of object-oriented systems from both static and dynamic information. Our approach is based on the combination of user-defined queries which allow an engineer to create high-level abstractions and to produce views using these abstractions.

171 citations


Proceedings ArticleDOI
24 Mar 1999
TL;DR: This study reports a study applying an execution slice-based technique to a reliability and performance evaluator to identify the code which is unique to a feature, or is common to a group of features.
Abstract: An important step towards effective software maintenance is to locate the code relevant to a particular feature. We report a study applying an execution slice-based technique to a reliability and performance evaluator to identify the code which is unique to a feature, or is common to a group of features. Supported by tools called ATAC and /spl chi/Vue, the program features in the source code can be tracked down to files, functions, lines of code, decisions, and then c- or p-uses. Our study suggests that the technique can provide software programmers and maintainers with a good starting point for quick program understanding.

105 citations


Patent
John S. Haikin1
17 Dec 1999
TL;DR: A software source code version control system for use during the development and maintenance of the source code by multiple software developers in which historical version tracking is provided by maintaining source code on a line-by-line basis in a source code storage as mentioned in this paper.
Abstract: A software source code version control system for use during the development and maintenance of the software source code by multiple software developers in which historical version tracking is provided by maintaining the source code on a line-by-line basis in a source code storage, in which multiple versions of each source code line are stored in the source code storage, wherein each version has a corresponding version code and user code, thereby providing concurrent access to the software source code lines by multiple software developers for modification.

101 citations


Proceedings ArticleDOI
16 Jun 1999
TL;DR: This work addresses the problem of static slicing on binary executables for the purposes of malicious code detection in COTS components by operating directly on binary code without any assumption on the availability of source code, which is realistic and appropriate for the analysis of COTS software products.
Abstract: We address the problem of static slicing on binary executables for the purposes of malicious code detection in COTS components. By operating directly on binary code without any assumption on the availability of source code, our approach is realistic and appropriate for the analysis of COTS software products. To be able to reason on such low-level code, we need a suite of program transformations that aim to get a high level imperative representation of the code. The intention is to significantly improve the analysability while preserving the original semantics. Next we apply slicing techniques to extract those code fragments that are critical from the security standpoint. Finally, these fragments are subjected to verification against behavioral specifications to statically decide whether they exhibit malicious behaviors or not.

94 citations


Patent
19 Aug 1999
TL;DR: In this article, a method for upgrading a software application from a prior version to a subsequent version while preserving user modifications to the prior application is presented, which includes comparing differences between the two versions of the software applications.
Abstract: A method for upgrading ( 31 - b of FIG. 1 ) a software application ( 35 - b ) from a prior version to a subsequent version while preserving user modifications to the prior application. The method includes comparing differences between the two versions of the software applications. This is followed by enumerating the differences between the two versions of the software applications; and determining which differences between the two versions of the software are conflicting and which are compatible. The compatible changes are made ( 7 ). Also disclosed is an article of manufacture containing computer readable program code for carrying out the above process, and a program storage device carrying the code.

94 citations


Book ChapterDOI
TL;DR: An overview of program slicing, a discussion of how to slice VHDL programs, a description of the resulting tool, and a brief overview of some applications and experimental results are provided.
Abstract: Hardware description languages (HDLs) are used today to describe circuits at all levels. In large HDL programs, there is a need for source code reduction techniques to address a myriad of problems in formal verification, design, simulation, and testing. Program slicing is a static program analysis technique that allows an analyst to automatically extract portions of programs relevant to the aspects being analyzed. We extend program slicing to HDLs, thus allowing for automatic program reduction to allow the user to focus on relevant code portions. We have implemented a VHDL slicing tool composed of a general inter-procedural slicer and a front-end that captures VHDL execution semantics. This paper provides an overview of program slicing, a discussion of how to slice VHDL programs, a description of the resulting tool, and a brief overview of some applications and experimental results.

Journal ArticleDOI
TL;DR: SeeSys is a system embodying a technique for visualizing statistics associated with code that is divided hierarchically into subsystems, directories and files that can display the relative sizes of the components in the system, the relative stability of the Components, and the location of error-prone code with many bug fixes.
Abstract: SeeSys is a system embodying a technique for visualizing statistics associated with code that is divided hierarchically into subsystems, directories and files. This technique can display the relative sizes of the components in the system, the relative stability of the components, the location of new functionality and the location of error-prone code with many bug fixes. Using animation, it can display the historical evolution of the code. Applying this technique, the source code from a multi-million line production software product is visualized.

Patent
10 Nov 1999
TL;DR: In this paper, a method of round-trip engineering source code from a software model, and in particular a method for forward engineering code previously reverse engineered into a model whereby to generate updated source code without any changes to the code not changed in the model, was presented.
Abstract: A method of round-trip engineering source code from a software model, and in particular a method of forward engineering code previously reverse engineered into a software model whereby to generate updated source code without any changes to the code not changed in the model, and without using obtrusive code markers in the source code. Elements from the original source code represented by the model are placed in a meta-model, and compared to a similar meta-model of the software model. Appropriate changes and additions are made in the source code to elements which have been changed in the software model. The rest of the code in the software model remains untouched.

Journal ArticleDOI
TL;DR: It is argued that the GENOA language is a simple and convenient vehicle for implementing a range of analysis tools, and that the “front-and reuse” approach of GENOA offers an important advantage for tools aimed at large software projects: the reuse of complex, expensive build procedures to run generated tools over large source bases.
Abstract: Code analysis tools provide support for such software engineering tasks as program understanding, software metrics, testing, and reengineering. In this article we describe GENOA, the framework underlying application generators such as Aria and GEN++ which have been used to generate a wide range of practical code analysis tools. This experience illustrates front-end retargetability of GENOA; we describe the features of the GENOA framework that allow it to be used with different front ends. While permitting arbitrary parse tree computations, the GENOA specification language has special, compact iteration operators that are tuned for expressing simple, polynomial-time analysis programs; in fact, there is a useful sublanguage of the GENOA language that can express precisely all (and only) polynomial-time (PTIME) analysis programs on parse trees. Thus, we argue that the GENOA language is a simple and convenient vehicle for implementing a range of analysis tools. We also argue that the “front-and reuse” approach of GENOA offers an important advantage for tools aimed at large software projects: the reuse of complex, expensive build procedures to run generated tools over large source bases. In this article, we describe the GENOA framework and our experiences with it.

Proceedings ArticleDOI
16 May 1999
TL;DR: The CHIME framework provides a flexible, customizable platform for inserting HTML links into software documents using information generated by existing software analysis tools, and enables tool builders to offer customized browsing support with a well-known GUI.
Abstract: Source code browsing is an important part of program comprehension. Browsers expose semantic and syntactic relationships (such as between object references and definitions) in GUI-accessible forms. These relationships are derived using tools which perform static analysis on the original software documents. Implementing such browsers is tricky. Program comprehension strategies vary, and it is necessary to provide the right browsing support. Analysis tools to derive the relevant cross-reference relationships are often difficult to build. Tools to browse distributed documents require extensive coding for the GUI, as well as for data communications. Therefore, there are powerful motivations for using existing static analysis tools in conjunction with WWW technology to implement browsers for distributed software projects. The CHIME framework provides a flexible, customizable platform for inserting HTML links into software documents using information generated by existing software analysis tools. Using the CHIME specification language, and a simple, retargetable database interface, it is possible to quickly incorporate a range of different link insertion tools for software documents, into an existing, legacy software development environment. This enables tool builders to offer customized browsing support with a well-known GUI. This paper describes the CHIME architecture, and describes our experience with several re-targeting efforts of this system.

Journal ArticleDOI
TL;DR: A hierarchy-aware classification schema for obje ct-oriented code, where software components are classified according to their behavioral characteristics, such as provided services, employed algorithms, and needed data is presented.
Abstract: This article presents a hierarchy-aware classification schema for obje ct-oriented code, where software components are classified according to their behavioral characteristics, such as provided services, employed algorithms, and needed data. In the case of reusable application frameworks, these characteristics are constructured from their model, i.e., from the description of the abstract classes specifying both the framework structure and purpose. In conventional object libraries, the characteristics are extracted semiautomatically from class interfaces. Characteristics are term pairs, weighted to represent “how well” they describe component behavior. The set of characteristics associated with a given component forms its software descriptor. A descriptor base is presented where descriptors are organized on the basis of structured relationships, such as similarity and composition. The classification is supported by a thesaurus acting as a language-independent unified lexicon. The descriptor base is conceived for developers who, besides conventionally browsing the descriptors hierarchy, can query the system, specifying a set of desired functionalities and getting a ranked set of adaptable candidates. User feedback is taken into account in order to progressively ameliorate the quality of the descriptors according to the views of the user community. Feedback is made dependent of the user typology through a user profile. Experimental results in terms of recall and precision of the retrieval mechanism against a sample code base are reported.

01 Jan 1999
TL;DR: This dissertation presents an extensible architecture that provides code generation at load-time and continuous program optimization at run-time, and presents two new optimization techniques developed explicitly in the context of continuous optimization: object layout adaptation and trace scheduling.
Abstract: In the wake of dramatic improvements in processor speed, it is often overlooked that much of the software in everyday operation is not making optimal use of the hardware on which it actually runs. Among the reasons for this discrepancy are hardware/software mismatches, performance problems induced by software engineering considerations, and the inability of systems to adapt to the user's behavior. The obvious solution to the problem is to delay code generation until load-time. This is the earliest point at which a piece of software can be fine-tuned to the actual capabilities of the hardware on which it is about to be executed. An even better match can be achieved by replacing the already executing software at regular intervals by new versions constructed on-the-fly using a background code re-optimizer. The code produced using such dynamic re-optimizers is often of a higher quality than can be achieved using static “off-line” compilation because live profiling data can be used to guide optimization decisions and hence the software can adapt to changing usage patterns and the late addition of dynamic link libraries. This dissertation presents an extensible architecture that provides code generation at load-time and continuous program optimization at run-time. It discusses important aspects such as when to optimize a given piece of software and how to optimize it. This dissertation also presents two new optimization techniques developed explicitly in the context of continuous optimization: object layout adaptation and trace scheduling. The former technique continuously improves the storage layout of dynamically allocated data structures to improve data cache locality. The latter increases the instruction level parallelism by continually adapting the instruction schedule to predominantly executed program paths. Using these new techniques, we have measured speed-ups of up to 96% over statically optimized programs under favorable circumstances.

Journal ArticleDOI
TL;DR: A knowledge-based, natural language processing approach to the automated understanding of object-oriented code as an aid to the reuse of object -oriented code is described and a system that implements the approach is examined.
Abstract: An automated tool to assist in the understanding of legacy code components can be useful both in the areas of software reuse and software maintenance. Most previous work in this area has concentrated on functionally-oriented code. Whereas object-oriented code has been shown to be inherently more reusable than functionally-oriented code, in many cases the eventual reuse of the object-oriented code was not considered during development. A knowledge-based, natural language processing approach to the automated understanding of object-oriented code as an aid to the reuse of object-oriented code is described. A system, called the PATRicia system (Program Analysis Tool for Reuse) that implements the approach is examined. The natural language processing/information extraction system that comprises a large part of the PATRicia system is discussed and the knowledge-base of the PATRicia system, in the form of conceptual graphs, is described. Reports provided by natural language-generation in the PATRicia system are described.

Proceedings ArticleDOI
02 May 1999
TL;DR: A new method to estimate the fault-proneness of an object class in the early phase, using several complexity metrics for object-oriented software.
Abstract: To analyse the complexity of object-oriented software, several metrics have been proposed. Among them, Chidamber and Kemerer's (1994) metrics are well-known object-oriented metrics. Also, their effectiveness has been empirically evaluated from the viewpoint of estimating the fault-proneness of object-oriented software. In the evaluations, these metrics were applied, not to the design specification but to the source code, because some of them measure the inner complexity of a class, and such information cannot be obtained until the algorithm and the class structure are determined at the end of the design phase. However, the estimation of the fault-proneness should be done in the early phase so as to effectively allocate effort for fixing the faults. This paper proposes a new method to estimate the fault-proneness of an object class in the early phase, using several complexity metrics for object-oriented software. In the proposed method, we introduce four checkpoints into the analysis/design/implementation phase, and we estimate the fault-prone classes using applicable metrics at each checkpoint.

Proceedings Article
01 Jun 1999
TL;DR: An algorithm for building a program ow graph representation of an MPI program that provides a basis for important program analyses useful in software testing, debugging and code optimization.
Abstract: The Message Passing Interface (MPI) has been widely used to develop e cient and portable parallel programs for distributed memory multiprocessors and workstation/PC clusters. In this paper, we present an algorithm for building a program ow graph representation of an MPI program. As an extension of the control ow graph representation of sequential codes, this representation provides a basis for important program analyses useful in software testing, debugging and code optimization.

Proceedings ArticleDOI
12 Oct 1999
TL;DR: The requirements of such code generation to obtain a high level of confidence in the correctness of the translation process are outlined and a translator for a state-based modeling language called RSML (Requirements Specification Modeling Language) is described that largely meets these requirements.
Abstract: Automated translation, or code generation, of a formal requirements model to production code can alleviate many of the problems associated with design and implementation. In this paper, we outline the requirements of such code generation to obtain a high level of confidence in the correctness of the translation process. We then describe a translator for a state-based modeling language called RSML (Requirements Specification Modeling Language) that largely meets these requirements.

Proceedings ArticleDOI
30 Aug 1999
TL;DR: This paper describes restructuring of C code into new C++ classes to facilitate both software reuse and software evolution and discusses the transformation tool-set and the design of the individual tools.
Abstract: In this paper, we describe restructuring of C code into new C++ classes. Such restructuring is done to facilitate both software reuse and software evolution. The restructuring is accomplished by restructuring tools and scenarios. We discuss the transformation tool-set and the design of the individual tools. The approach is demonstrated on a case study We also discuss how this tool-set could be enhanced in the future.

Proceedings ArticleDOI
17 Nov 1999
TL;DR: The issues involved in automatic code generation for high-assurance systems are discussed and a set of requirements that code generators for this domain must satisfy are defined.
Abstract: Although formal requirements specifications can provide a complete and consistent description of a safety-critical software system, designing and developing production quality code from high-level specifications can be a time-consuming and error-prone process. Automated translation, or code generation, of the specification to production code can alleviate many of the problems associated with design and implementation. However, current approaches have been unsuitable for safety-critical environments because they employ complex and/or ad-hoc methods for translation. In this paper we discuss the issues involved in automatic code generation for high-assurance systems and define a set of requirements that code generators for this domain must satisfy. These requirements cover the formality of the translation, the quality of the code generator, and the properties of the generated code.

Proceedings ArticleDOI
05 Jan 1999
TL;DR: A model is investigated that represents the program sequential execution of nodules as a stochastic process that may help to learn exactly where the system is fragile and under which execution patterns a certain level of reliability can be guaranteed.
Abstract: Assessing the reliability of a software system has always been an elusive target. A program may work very well for a number of years and this same program may suddenly become quite unreliable if its mission is changed by the user. This has led to the conclusion that the failure of a software system is dependent only on what the software is currently doing. If a program is always executing a set of fault free modules, it will certainly execute indefinitely without any likelihood of failure. A program may execute a sequence of fault prone modules and still not fail. In this particular case, the faults may lie in a region of the code that is not likely to be expressed during the execution of that module. A failure event can only occur when the software system executes a module that contains faults. If an execution pattern that drives the program into a module that contains faults is ever selected, then the program will never fail. Alternatively, a program may execute successfully a module that contains faults just as long as the faults are in code subsets that are not executed. The reliability of the system then, can only be determined with respect to what the software is currently doing. Future reliability predictions will be bound in their precision by the degree of understanding of future execution patterns. We investigate a model that represents the program sequential execution of nodules as a stochastic process. By analyzing the transitions between modules and their failure counts, we may learn exactly where the system is fragile and under which execution patterns a certain level of reliability can be guaranteed.

Book ChapterDOI
07 Jun 1999
TL;DR: This work extends the intraprocedural data-flow framework introduced in [3] to support interprocesural symbolic evaluation and utilizes a novel approach based on an array algebra to handle aliases induced by procedure calls.
Abstract: Symbolic Evaluation is a technique aimed at determining dynamic properties of programs We extend our intraprocedural data-flow framework introduced in [3] to support interprocedural symbolic evaluation Our data-flow framework utilizes a novel approach based on an array algebra to handle aliases induced by procedure calls It serves as as a basis for static program analysis (eg reaching definitions-, alias analysis, worst-case performance estimations, cache analysis) Examples for reaching definitions- as well as alias analysis are presented

Proceedings ArticleDOI
10 Feb 1999
TL;DR: The preliminary design of a DynaMICs snoopy-coprocessor system is presented, i.e., one that employs a coprocesser that utilizes bus-monitoring hardware to facilitate the concurrent execution of the application and constraint-checking code.
Abstract: Dynamic Monitoring with Integrity Constraints (DynaMICs) is a software-fault monitoring approach in which the constraints are maintained separately from the program. Since the constraints are not entwined in the code, the approach facilitates the maintenance of the application and constraint code. Through code analysis during compilation, the points at which constraint checking should occur are determined. DynaMICs minimizes performance degradation, addressing a problem that has limited the use of runtime software-fault monitoring. This paper presents the preliminary design of a DynaMICs snoopy-coprocessor system, i.e., one that employs a coprocessor that utilizes bus-monitoring hardware to facilitate the concurrent execution of the application and constraint-checking code. In this approach, the coprocessor executes the constraint-checking code while the main processor executes the application code.

Proceedings ArticleDOI
05 May 1999
TL;DR: An observational study with professional programmers performing a debugging and an enhancement task finds that static analysis tools, specifically data flow and slicing tools are considered useful for software maintenance.
Abstract: Since Weiser's study (see CACM, vol.25, no.7, p.446-52), static analysis tools, specifically data flow and slicing tools are considered useful for software maintenance. To investigate this further, we conducted an observational study with professional programmers performing a debugging and an enhancement task.

Patent
Chung T. Nguyen1
07 Dec 1999
TL;DR: Disclosed as discussed by the authors is a system, method, and program for generating a compiler to map a code set to object code capable of being executed on an operating system platform, where at least one neural network is trained to convert the code set into object code.
Abstract: Disclosed is a system, method, and program for generating a compiler to map a code set to object code capable of being executed on an operating system platform. At least one neural network is trained to convert the code set to object code. The at least one trained neural network can then be used to convert the code set to the object code.

Patent
21 Dec 1999
TL;DR: In this article, a process that takes converted system software code as input to a program created with a commercial compiler to produce a simulated Operator System Interface (OSI) is described.
Abstract: A process that takes converted system software code as input to a program created with a commercial compiler to produce a simulated Operator System Interface. An Operator System Interface is produced using program data structures rather than extensive emulation or updating of program source code. These files can then be modified or updated and converted back to both drive the system software and the Software Specification. This eliminates extensive manpower intensive emulation, and significantly reduces update work on actual system source code.

Proceedings Article
08 Nov 1999
TL;DR: This work presents a software repository system based on an existing information retrieval system for structured text, where source code is treated as text, augmented with supplementary syntactic and semantic information.
Abstract: Software repositories, used to support program development and maintenance, invariably require an abstract model of the source code. This requirement restricts the repository user to the analyses and queries supported by the data model of the repository. In this work, we present a software repository system based on an existing information retrieval system for structured text. Source code is treated as text, augmented with supplementary syntactic and semantic information. Both the source text and supplementary information can then be queried to retrieve elements of the code. No transformations are necessary to satisfy the requirements of a database storage model. As a result, the system is free of many of the limitations imposed by existing systems.