scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Software Engineering and Methodology in 2014"


Journal ArticleDOI
TL;DR: The study confirms that VOSUITE can achieve good levels of branch coverage in practice, and exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects.
Abstract: Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated scientific advance in automated unit test generation with the EVOSUITE tool by evaluating it on manually selected open-source projects or examples that represent a particular problem addressed by the underlying technique. However, demonstrating scientific advance is not necessarily the same as demonstrating practical value; even if VOSUITE worked well on the software projects we selected for evaluation, it might not scale up to the complexity of real systems. Ideally, one would use large “real-world” software systems to minimize the threats to external validity when evaluating research tools. However, neither choosing such software systems nor applying research prototypes to them are trivial tasks. In this article we present the results of a large experiment in unit test generation using the VOSUITE tool on 100 randomly chosen open-source projects, the 10 most popular open-source projects according to the SourceForge Web site, seven industrial projects, and 11 automatically generated software projects. The study confirms that VOSUITE can achieve good levels of branch coverage (on average, 71p per class) in practice. However, the study also exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects. Furthermore, our experiments demonstrate how practical limitations interfere with scientific advances, branch coverage on an unbiased sample is affected by predominant environmental dependencies. The surprisingly large effect of such practical engineering problems in unit testing will hopefully lead to a larger appreciation of work in this area, thus supporting transfer of knowledge from software testing research to practice.

176 citations


Journal ArticleDOI
TL;DR: The results indicate that this approach is effective at finding relevant code, can be used on its own or to filter results from keyword searches to increase search precision, and is adaptable to find approximate matches and then guide modifications to match the user specifications when exact matches do not already exist.
Abstract: Programmers frequently search for source code to reuse using keyword searches. The search effectiveness in facilitating reuse, however, depends on the programmer's ability to specify a query that captures how the desired code may have been implemented. Further, the results often include many irrelevant matches that must be filtered manually. More semantic search approaches could address these limitations, yet existing approaches are either not flexible enough to find approximate matches or require the programmer to define complex specifications as queries. We propose a novel approach to semantic code search that addresses several of these limitations and is designed for queries that can be described using a concrete input/output example. In this approach, programmers write lightweight specifications as inputs and expected output examples. Unlike existing approaches to semantic search, we use an SMT solver to identify programs or program fragments in a repository, which have been automatically transformed into constraints using symbolic analysis, that match the programmer-provided specification. We instantiated and evaluated this approach in subsets of three languages, the Java String library, Yahooe Pipes mashup language, and SQL select statements, exploring its generality, utility, and trade-offs. The results indicate that this approach is effective at finding relevant code, can be used on its own or to filter results from keyword searches to increase search precision, and is adaptable to find approximate matches and then guide modifications to match the user specifications when exact matches do not already exist. These gains in precision and flexibility come at the cost of performance, for which underlying factors and mitigation strategies are identified.

126 citations


Journal ArticleDOI
TL;DR: Those situations where the use of multilevel modelling is beneficial, and how often the identified patterns arise in practice are discussed, and a wide range of existing two-level DSMLs from different sources and domains are analysed to detect when their elements could be rearranged in more than two metalevels.
Abstract: Model-Driven Engineering (MDE) promotes models as the primary artefacts in the software development process, from which code for the final application is derived. Standard approaches to MDE (like those based on MOF or EMF) advocate a two-level metamodelling setting where Domain-Specific Modelling Languages (DSMLs) are defined through a metamodel that is instantiated to build models at the metalevel below. Multilevel modelling (also called deep metamodelling) extends the standard approach to metamodelling by enabling modelling at an arbitrary number of metalevels, not necessarily two. Proposers of multilevel modelling claim this leads to simpler model descriptions in some situations, although its applicability has been scarcely evaluated. Thus, practitioners may find it difficult to discern when to use it and how to implement multilevel solutions in practice. In this article, we discuss those situations where the use of multilevel modelling is beneficial, and identify recurring patterns and idioms. Moreover, in order to assess how often the identified patterns arise in practice, we have analysed a wide range of existing two-level DSMLs from different sources and domains, to detect when their elements could be rearranged in more than two metalevels. The results show this scenario is not uncommon, while in some application domains (like software architecture and enterprise/process modelling) pervasive, with a high average number of pattern occurrences per metamodel.

124 citations


Journal ArticleDOI
TL;DR: This work proposes an automated approach to help developers improve the quality of software modularization by analyzes underlying latent topics in source code as well as structural dependencies to recommend (and explain) refactoring operations aiming at moving a class to a more suitable package.
Abstract: Oftentimes, during software maintenance the original program modularization decays, thus reducing its quality. One of the main reasons for such architectural erosion is suboptimal placement of source-code classes in software packages. To alleviate this issue, we propose an automated approach to help developers improve the quality of software modularization. Our approach analyzes underlying latent topics in source code as well as structural dependencies to recommend (and explain) refactoring operations aiming at moving a class to a more suitable package. The topics are acquired via Relational Topic Models (RTM), a probabilistic topic modeling technique. The resulting tool, coined as R3 (Rational Refactoring via RTM), has been evaluated in two empirical studies. The results of the first study conducted on nine software systems indicate that R3 provides a coupling reduction from 10p to 30p among the software modules. The second study with 62 developers confirms that R3 is able to provide meaningful recommendations (and explanations) for move class refactoring. Specifically, more than 70p of the recommendations were considered meaningful from a functional point of view.

123 citations


Journal ArticleDOI
TL;DR: A unified test case prioritization approach that encompasses both the total and additional strategies and demonstrates that a wide range of techniques derived from basic and extended models with uniform fp values can outperform purely total techniques and are competitive with purely additional techniques.
Abstract: Test case prioritization techniques attempt to reorder test cases in a manner that increases the rate at which faults are detected during regression testing. Coverage-based test case prioritization techniques typically use one of two overall strategies: a total strategy or an additional strategy. These strategies prioritize test cases based on the total number of code (or code-related) elements covered per test case and the number of additional (not yet covered) code (or code-related) elements covered per test case, respectively. In this article, we present a unified test case prioritization approach that encompasses both the total and additional strategies. Our unified test case prioritization approach includes two models (basic and extended) by which a spectrum of test case prioritization techniques ranging from a purely total to a purely additional technique can be defined by specifying the value of a parameter referred to as the fp value. To evaluate our approach, we performed an empirical study on 28 Java objects and 40 C objects, considering the impact of three internal factors (model type, choice offp value, and coverage type) and three external factors (coverage granularity, test case granularity, and programming/testing paradigm), all of which can be manipulated by our approach. Our results demonstrate that a wide range of techniques derived from our basic and extended models with uniform fp values can outperform purely total techniques and are competitive with purely additional techniques. Considering the influence of each internal and external factor studied, the results demonstrate that various values of each factor have nontrivial influence on test case prioritization techniques.

96 citations


Journal ArticleDOI
TL;DR: This work shows that CC is prevalent in both of its forms and demonstrates that it is a safety reducing factor for Coverage-Based Fault Localization (CBFL), and proposes two techniques for cleansing test suites from coincidental correctness to enhance CBFL.
Abstract: Researchers have argued that for failure to be observed the following three conditions must be met: CR = the defect was reached; CI = the program has transitioned into an infectious state; and CP = the infection has propagated to the output. Coincidental Correctness (CC) arises when the program produces the correct output while condition CR is met but not CP. We recognize two forms of coincidental correctness, weak and strong. In weak CC, CR is met, whereas CI might or might not be met, whereas in strongCC, both CR and CI are met. In this work we first show that CC is prevalent in both of its forms and demonstrate that it is a safety reducing factor for Coverage-Based Fault Localization (CBFL). We then propose two techniques for cleansing test suites from coincidental correctness to enhance CBFL, given that the test cases have already been classified as failing or passing. We evaluated the effectiveness of our techniques by empirically quantifying their accuracy in identifying weak CC tests. The results were promising, for example, the better performing technique, using 105 test suites and statement coverage, exhibited 9p false negatives, 30p false positives, and no false negatives nor false positives in 14.3p of the test suites. Also using 73 test suites and more complex coverage, the numbers were 12p, 19p, and 15p, respectively.

87 citations


Journal ArticleDOI
TL;DR: In this paper, a family of four controlled experiments, involving 139 participants having different profiles, were conducted on the requirements documents of two desktop applications, and they found a statistically significant large effect of the presence of screen mockups on both comprehension effectiveness and comprehension task efficiency.
Abstract: Over the last few years, the software engineering community has proposed a number of modeling methods to represent functional requirements. Among them, use cases are recognized as an easy to use and intuitive way to capture and define such requirements. Screen mockups (also called user-interface sketches or user interface-mockups) have been proposed as a complement to use cases for improving the comprehension of functional requirements. In this article, we aim at quantifying the benefits achievable by augmenting use cases with screen mockups in the comprehension of functional requirements with respect to effectiveness, effort, and efficiency. For this purpose, we conducted a family of four controlled experiments, involving 139 participants having different profiles. The experiments involved comprehension tasks performed on the requirements documents of two desktop applications. Independently from the participants' profile, we found a statistically significant large effect of the presence of screen mockups on both comprehension effectiveness and comprehension task efficiency. No significant effect was observed on the effort to complete tasks. The main pragmatic lesson is that the screen mockups addition to use cases is able to almost double the efficiency of comprehension tasks.

81 citations


Journal ArticleDOI
TL;DR: The statistical analysis of the experiments over 31 runs on nine open-source systems and one industrial project shows that seven types of code smells were detected with an average of more than 86% in terms of precision and recall, which confirms the outperformance of the bilevel proposal compared to state-of-art code-smell detection techniques.
Abstract: Code smells represent design situations that can affect the maintenance and evolution of software. They make the system difficult to evolve. Code smells are detected, in general, using quality metrics that represent some symptoms. However, the selection of suitable quality metrics is challenging due to the absence of consensus in identifying some code smells based on a set of symptoms and also the high calibration effort in determining manually the threshold value for each metric. In this article, we propose treating the generation of code-smell detection rules as a bilevel optimization problem. Bilevel optimization problems represent a class of challenging optimization problems, which contain two levels of optimization tasks. In these problems, only the optimal solutions to the lower-level problem become possible feasible candidates to the upper-level problem. In this sense, the code-smell detection problem can be treated as a bilevel optimization problem, but due to lack of suitable solution techniques, it has been attempted to be solved as a single-level optimization problem in the past. In our adaptation here, the upper-level problem generates a set of detection rules, a combination of quality metrics, which maximizes the coverage of the base of code-smell examples and artificial code smells generated by the lower level. The lower level maximizes the number of generated artificial code smells that cannot be detected by the rules produced by the upper level. The main advantage of our bilevel formulation is that the generation of detection rules is not limited to some code-smell examples identified manually by developers that are difficult to collect, but it allows the prediction of new code-smell behavior that is different from those of the base of examples. The statistical analysis of our experiments over 31 runs on nine open-source systems and one industrial project shows that seven types of code smells were detected with an average of more than 86p in terms of precision and recall. The results confirm the outperformance of our bilevel proposal compared to state-of-art code-smell detection techniques. The evaluation performed by software engineers also confirms the relevance of detected code smells to improve the quality of software systems.

74 citations


Journal ArticleDOI
TL;DR: This article proposes a novel adaptive temporal violation handling point selection strategy where the phenomenon of self-recovery where execution delays can be automatically compensated for by the saved execution time of subsequent workflow activities has been entirely overlooked.
Abstract: Scientific processes are usually time constrained with overall deadlines and local milestones. In scientific workflow systems, due to the dynamic nature of the underlying computing infrastructures such as grid and cloud, execution delays often take place and result in a large number of temporal violations. Since temporal violation handling is expensive in terms of both monetary costs and time overheads, an essential question aroused is “do we need to handle every temporal violation in scientific workflow systemsq” The answer would be “true” according to existing works on workflow temporal management which adopt the philosophy similar to the handling of functional exceptions, that is, every temporal violation should be handled whenever it is detected. However, based on our observation, the phenomenon of self-recovery where execution delays can be automatically compensated for by the saved execution time of subsequent workflow activities has been entirely overlooked. Therefore, considering the nonfunctional nature of temporal violations, our answer is “not necessarily true.” To take advantage of self-recovery, this article proposes a novel adaptive temporal violation handling point selection strategy where this phenomenon is effectively utilised to avoid unnecessary temporal violation handling. Based on simulations of both real-world scientific workflows and randomly generated test cases, the experimental results demonstrate that our strategy can significantly reduce the cost on temporal violation handling by over 96p while maintaining extreme low violation rate under normal circumstances.

64 citations


Journal ArticleDOI
TL;DR: The confounding effect of class size on the associations between object-oriented metrics and fault-proneness in general exists and the proposed linear regression-based method can effectively remove the confounding effect, and the prediction performance of fault prediction models with respect to both ranking and classification can in general be significantly improved.
Abstract: Background The extent of the potentially confounding effect of class size in the fault prediction context is not clear, nor is the method to remove the potentially confounding effect, or the influence of this removal on the performance of fault-proneness prediction models Objective We aim to provide an in-depth understanding of the effect of class size on the true associations between object-oriented metrics and fault-proneness Method We first employ statistical methods to examine the extent of the potentially confounding effect of class size in the fault prediction context After that, we propose a linear regression-based method to remove the potentially confounding effect Finally, we empirically investigate whether this removal could improve the prediction performance of fault-proneness prediction models Results Based on open-source software systems, we found: (a) the confounding effect of class size on the associations between object-oriented metrics and fault-proneness in general exists; (b) the proposed linear regression-based method can effectively remove the confounding effect; and (c) after removing the confounding effect, the prediction performance of fault prediction models with respect to both ranking and classification can in general be significantly improved Conclusion We should remove the confounding effect of class size when building fault prediction models

63 citations


Journal ArticleDOI
TL;DR: This work proposes the first SMT-solver-based method for formally verifying the security of a masking countermeasure against power-analysis-based side-channel attacks, and implements the proposed method in a software verification tool based on the LLVM compiler frontend and the Yices SMT solver.
Abstract: A common strategy for designing countermeasures against power-analysis-based side-channel attacks is using random masking techniques to remove the statistical dependency between sensitive data and side-channel emissions. However, this process is both labor intensive and error prone and, currently, there is a lack of automated tools to formally assess how secure a countermeasure really is. We propose the first SMT-solver-based method for formally verifying the security of a masking countermeasure against such attacks. In addition to checking whether the sensitive data are masked by random variables, we also check whether they are perfectly masked, that is, whether the intermediate computation results in the implementation of a cryptographic algorithm are independent of the secret key. We encode this verification problem using a series of quantifier-free first-order logic formulas, whose satisfiability can be decided by an off-the-shelf SMT solver. We have implemented the proposed method in a software verification tool based on the LLVM compiler frontend and the Yices SMT solver. Our experiments on a set of recently proposed masking countermeasures for cryptographic algorithms such as AES and MAC-Keccak show the method is both effective in detecting power side-channel leaks and scalable for practical use.

Journal ArticleDOI
TL;DR: A framework derived from the literature on Inner Source is presented, which identifies nine important factors that need to be considered when implementing Inner Source, and can be used as a probing instrument to assess an organization on these nine factors so as to gain an understanding of whether or not Inner source is suitable.
Abstract: A number of organizations have adopted Open Source Software (OSS) development practices to support or augment their software development processes, a phenomenon frequently referred to as Inner Source. However the adoption of Inner Source is not a straightforward issue. Many organizations are struggling with the question of whether Inner Source is an appropriate approach to software development for them in the first place. This article presents a framework derived from the literature on Inner Source, which identifies nine important factors that need to be considered when implementing Inner Source. The framework can be used as a probing instrument to assess an organization on these nine factors so as to gain an understanding of whether or not Inner Source is suitable. We applied the framework in three case studies at Philips Healthcare, Neopost Technologies, and Rolls-Royce, which are all large organizations that have either adopted Inner Source or were planning to do so. Based on the results presented in this article, we outline directions for future research.

Journal ArticleDOI
TL;DR: A definition of requirements engineering and software test (REST) alignment, a taxonomy that characterizes the methods linking the respective areas, and a process to assess alignment are proposed to support researchers to identify new opportunities for investigation and practitioners to compare alignment methods and evaluate alignment, or lack thereof.
Abstract: Requirements Engineering and Software Testing are mature areas and have seen a lot of research. Nevertheless, their interactions have been sparsely explored beyond the concept of traceability. To fill this gap, we propose a definition of requirements engineering and software test (REST) alignment, a taxonomy that characterizes the methods linking the respective areas, and a process to assess alignment. The taxonomy can support researchers to identify new opportunities for investigation, as well as practitioners to compare alignment methods and evaluate alignment, or lack thereof. We constructed the REST taxonomy by analyzing alignment methods published in literature, iteratively validating the emerging dimensions. The resulting concept of an information dyad characterizes the exchange of information required for any alignment to take place. We demonstrate use of the taxonomy by applying it on five in-depth cases and illustrate angles of analysis on a set of thirteen alignment methods. In addition, we developed an assessment framework (REST-bench), applied it in an industrial assessment, and showed that it, with a low effort, can identify opportunities to improve REST alignment. Although we expect that the taxonomy can be further refined, we believe that the information dyad is a valid and useful construct to understand alignment.

Journal ArticleDOI
TL;DR: A degree-of-knowledge model is introduced that computes automatically, for each source-code element in a codebase, a real value that represents a developer's knowledge of that element based on a developers' authorship and interaction data.
Abstract: As a software system evolves, the system's codebase constantly changes, making it difficult for developers to answer such questions as who is knowledgeable about particular parts of the code or who needs to know about changes made. In this article, we show that an externalized model of a developer's individual knowledge of code can make it easier for developers to answer such questions. We introduce a degree-of-knowledge model that computes automatically, for each source-code element in a codebase, a real value that represents a developer's knowledge of that element based on a developer's authorship and interaction data. We present evidence that shows that both authorship and interaction data of the code are important in characterizing a developer's knowledge of code. We report on the usage of our model in case studies on expert finding, knowledge transfer, and identifying changes of interest. We show that our model improves upon an existing expertise-finding approach and can accurately identify changes for which a developer should likely be aware. We discuss how our model may provide a starting point for knowledge transfer but that more refinement is needed. Finally, we discuss the robustness of the model across multiple development sites.

Journal ArticleDOI
TL;DR: A mechanism to establish traceability between (functional) safety requirements and SysML design models to extract design slices that filter out irrelevant details but keep enough context information for the slices to be easy to inspect and understand is devised.
Abstract: Certifying safety-critical software and ensuring its safety requires checking the conformance between safety requirements and design. Increasingly, the development of safety-critical software relies on modeling, and the System Modeling Language (SysML) is now commonly used in many industry sectors. Inspecting safety conformance by comparing design models against safety requirements requires safety inspectors to browse through large models and is consequently time consuming and error-prone. To address this, we have devised a mechanism to establish traceability between (functional) safety requirements and SysML design models to extract design slices (model fragments) that filter out irrelevant details but keep enough context information for the slices to be easy to inspect and understand. In this article, we report on a controlled experiment assessing the impact of the traceability and slicing mechanism on inspectors' conformance decisions and effort. Results show a significant decrease in effort and an increase in decisions' correctness and level of certainty.

Journal ArticleDOI
TL;DR: A novel abstraction called Join Point Interface is introduced, which, by design, aids modular reasoning and independent evolution by decoupling aspects from base code and by providing a modular type-checking algorithm.
Abstract: In current aspect-oriented systems, aspects usually carry, through their pointcuts, explicit references to the base code. Those references are fragile and hinder important software engineering properties such as modular reasoning and independent evolution of aspects and base code. In this work, we introduce a novel abstraction called Join Point Interface, which, by design, aids modular reasoning and independent evolution by decoupling aspects from base code and by providing a modular type-checking algorithm. Join point interfaces can be used both with implicit announcement through pointcuts, and with explicit announcement, using closure join points. Join point interfaces further offer polymorphic dispatch on join points, with an advice-dispatch semantics akin to multimethods. To support flexible join point matching, we incorporate into our language an earlier proposal for generic advice, and introduce a mechanism for controlled global quantification. We motivate each language feature in detail, showing that it is necessary to obtain a language design that is both type safe and flexible enough to support typical aspect-oriented programming idioms. We have implemented join point interfaces as an open-source extension to AspectJ. A case study on existing aspect-oriented programs supports our design, and in particular shows the necessity of both generic interfaces and some mechanism for global quantification.

Journal ArticleDOI
TL;DR: The results of both the individual experiments and meta-analysis indicate that UML models produced in the requirements analysis process influence neither the comprehensibility of source code nor its modifiability.
Abstract: We carried out a family of experiments to investigate whether the use of UML models produced in the requirements analysis process helps in the comprehensibility and modifiability of source code. The family consists of a controlled experiment and 3 external replications carried out with students and professionals from Italy and Spain. 86 participants with different abilities and levels of experience with UML took part. The results of the experiments were integrated through the use of meta-analysis. The results of both the individual experiments and meta-analysis indicate that UML models produced in the requirements analysis process influence neither the comprehensibility of source code nor its modifiability.

Journal ArticleDOI
TL;DR: An approach to sensitivity analysis based on exact optimization is introduced, which was implemented as a tool, OATSAC, which allowed us to experimentally evaluate the scalability and applicability of Requirements Sensitivity Analysis (RSA).
Abstract: The nature of the requirements analysis problem, based as it is on uncertain and often inaccurate estimates of costs and effort, makes sensitivity analysis important. Sensitivity analysis allows the decision maker to identify those requirements and budgets that are particularly sensitive to misestimation. However, finding scalable sensitivity analysis techniques is not easy because the underlying optimization problem is NP-hard. This article introduces an approach to sensitivity analysis based on exact optimization. We implemented this approach as a tool, OATSAC, which allowed us to experimentally evaluate the scalability and applicability of Requirements Sensitivity Analysis (RSA). Our results show that OATSAC scales sufficiently well for practical applications in Requirements Sensitivity Analysis. We also show how the sensitivity analysis can yield insights into difficult and otherwise obscure interactions between budgets, requirements costs, and estimate inaccuracies using a real-world case study.

Journal ArticleDOI
TL;DR: An approach called SynDB is proposed that synthesizes new database interactions to replace the original ones from the database application under test that can achieve higher code coverage than existing test generation approaches for database applications.
Abstract: Testing database applications typically requires the generation of tests consisting of both program inputs and database states. Recently, a testing technique called Dynamic Symbolic Execution (DSE) has been proposed to reduce manual effort in test generation for software applications. However, applying DSE to generate tests for database applications faces various technical challenges. For example, the database application under test needs to physically connect to the associated database, which may not be available for various reasons. The program inputs whose values are used to form the executed queries are not treated symbolically, posing difficulties for generating valid database states or appropriate database states for achieving high coverage of query-result-manipulation code. To address these challenges, in this article, we propose an approach called SynDB that synthesizes new database interactions to replace the original ones from the database application under test. In this way, we bridge various constraints within a database application: query-construction constraints, query constraints, database schema constraints, and query-result-manipulation constraints. We then apply a state-of-the-art DSE engine called Pex for .NET from Microsoft Research to generate both program inputs and database states. The evaluation results show that tests generated by our approach can achieve higher code coverage than existing test generation approaches for database applications.

Journal ArticleDOI
TL;DR: It is argued that software architecture should play a more prominent role in the development of collaborative applications to help to better manage design complexity by modularizing collaborations and separating concerns.
Abstract: In today's volatile business environments, collaboration between information systems, both within and across company borders, has become essential to success. An efficient supply chain, for example, requires the collaboration of distributed and heterogeneous systems of multiple companies. Developing such collaborative applications and building the supporting information systems poses several engineering challenges. A key challenge is to manage the ever-growing design complexity. In this article, we argue that software architecture should play a more prominent role in the development of collaborative applications. This can help to better manage design complexity by modularizing collaborations and separating concerns. State-of-the-art solutions, however, often lack proper abstractions for modeling collaborations at architectural level or do not reify these abstractions at detailed design and implementation level. Developers, on the other hand, rely on middleware, business process management, and Web services, techniques that mainly focus on low-level infrastructure.To address the problem of managing the design complexity of collaborative applications, we present Macodo. Macodo consists of three complementary parts: (1) a set of abstractions for modeling adaptive collaborations, (2) a set of architectural views, the main contribution of this article, that reify these abstractions at architectural level, and (3) a proof-of-concept middleware infrastructure that supports the architectural abstractions at design and implementation level. We evaluate the architectural views in a controlled experiment. Results show that the use of Macodo can reduce fault density and design complexity, and improve reuse and productivity. The main contributions of this article are illustrated in a supply chain management case.

Journal ArticleDOI
TL;DR: The novelty of DiSE is to combine the efficiencies of static analysis techniques to compute program difference information with the precision of symbolic execution to explore program execution paths and generate path conditions affected by the differences.
Abstract: The last few years have seen a resurgence of interest in the use of symbolic execution—a program analysis technique developed more than three decades ago to analyze program execution paths. Scaling symbolic execution to real systems remains challenging despite recent algorithmic and technological advances. An effective approach to address scalability is to reduce the scope of the analysis. For example, in regression analysis, differences between two related program versions are used to guide the analysis. While such an approach is intuitive, finding efficient and precise ways to identify program differences, and characterize their impact on how the program executes has proved challenging in practice.In this article, we present Directed Incremental Symbolic Execution (DiSE), a novel technique for detecting and characterizing the impact of program changes to scale symbolic execution. The novelty of DiSE is to combine the efficiencies of static analysis techniques to compute program difference information with the precision of symbolic execution to explore program execution paths and generate path conditions affected by the differences. DiSE complements other reduction and bounding techniques for improving symbolic execution. Furthermore, DiSE does not require analysis results to be carried forward as the software evolves—only the source code for two related program versions is required. An experimental evaluation using our implementation of DiSE illustrates its effectiveness at detecting and characterizing the effects of program changes.

Journal ArticleDOI
TL;DR: An efficient procedure, based on heuristic search, for checking Milner's strong and weak equivalence; to achieve higher efficiency, the heuristics aims to find a counterexample, even if not the minimum one, to prove nonequivalence.
Abstract: Equivalence checking plays a crucial role in formal verification to ensure the correctness of concurrent systems. However, this method cannot be scaled as easily with the increasing complexity of systems due to the state explosion problem. This article presents an efficient procedure, based on heuristic search, for checking Milner's strong and weak equivalence; to achieve higher efficiency, we actually search for a difference between two processes to be discovered as soon as possible, thus the heuristics aims to find a counterexample, even if not the minimum one, to prove nonequivalence. The presented algorithm builds the system state graph on-the-fly, during the checking, and the heuristics promotes the construction of the more promising subgraph. The heuristic function is syntax based, but the approach can be applied to different specification languages such as CCS, LOTOS, and CSP, provided that the language semantics is based on the concept of transition. The algorithm to explore the search space of the problem is based on a greedy technique; GreASE (Greedy Algorithm for System Equivalence), the tool supporting the approach, is used to evaluate the achieved reduction of both state-space size and time with respect to other verification environments.

Journal ArticleDOI
TL;DR: A general framework for designing and implementing scalable analysis algorithms to find causes of bloat in Java programs is presented, a generalized form of runtime dependence graph computed by abstract dynamic slicing, a semantics-aware technique that achieves high scalability by performing dynamic slicing over bounded abstract domains.
Abstract: Many large-scale Java applications suffer from runtime bloat. They execute large volumes of methods and create many temporary objects, all to execute relatively simple operations. There are large opportunities for performance optimizations in these applications, but most are being missed by existing optimization and tooling technology. While JIT optimizations struggle for a few percent improvement, performance experts analyze deployed applications and regularly find gains of 2× or more. Finding such big gains is difficult, for both humans and compilers, because of the diffuse nature of runtime bloat. Time is spread thinly across calling contexts, making it difficult to judge how to improve performance. Our experience shows that, in order to identify large performance bottlenecks in a program, it is more important to understand its dynamic dataflow than traditional performance metrics, such as running time. This article presents a general framework for designing and implementing scalable analysis algorithms to find causes of bloat in Java programs. At the heart of this framework is a generalized form of runtime dependence graph computed by abstract dynamic slicing, a semantics-aware technique that achieves high scalability by performing dynamic slicing over bounded abstract domains. The framework is instantiated to create two independent dynamic analyses, copy profiling and cost-benefit analysis, that help programmers identify performance bottlenecks by identifying, respectively, high-volume copy activities and data structures that have high construction cost but low benefit for the forward execution. We have successfully applied these analyses to large-scale and long-running Java applications. We show that both analyses are effective at detecting inefficient operations that can be optimized for better performance. We also demonstrate that the general framework is flexible enough to be instantiated for dynamic analyses in a variety of application domains.

Journal ArticleDOI
TL;DR: The temporal logic CTLcc is developed that extends Computation Tree Logic with new modalities which allow representing and reasoning about two types of communicating conditional commitments and their fulfillments using the formalism of interpreted systems.
Abstract: While modeling interactions using social commitments provides a fundamental basis for capturing flexible and declarative interactions and helps in addressing the challenge of ensuring compliance with specifications, the designers of the system cannot guarantee that an agent complies with its commitments as it is supposed to, or at least an agent doesn't want to violate its commitments. They may still wish to develop efficient and scalable algorithms by which model checking conditional commitments, a natural and universal frame of social commitments, is feasible at design time. However, distinguishing between different but related types of conditional commitments, and developing dedicated algorithms to tackle the problem of model checking conditional commitments, is still an active research topic. In this article, we develop the temporal logic CTLcc that extends Computation Tree Logic (CTL) with new modalities which allow representing and reasoning about two types of communicating conditional commitments and their fulfillments using the formalism of interpreted systems. We introduce a set of rules to reason about conditional commitments and their fulfillments. The verification technique is based on developing a new symbolic model checking algorithm to address this verification problem. We analyze the computational complexity and present the full implementation of the developed algorithm on top of the MCMAS model checker. We also evaluate the algorithm's effectiveness and scalability by verifying the compliance of the NetBill protocol, taken from the business domain, and the process of breast cancer diagnosis and treatment, taken from the health-care domain, with specifications expressed in CTLcc. We finally compare the experimental results with existing proposals.

Journal ArticleDOI
TL;DR: In this paper, the authors present an algorithm, along with a tool QKS implementing it, that from a formal model (as a discrete-time linear hybrid system) of the controlled system (plant), implementation specifications (that is, number of bits in the Analog-to-Digital, AD, conversion) and system-level formal specifications, returns correct-by-construction control software that has a Worst-Case Execution Time (WCET) linear in the number of AD bits and meets the given specifications.
Abstract: Many embedded systems are indeed software-based control systems, that is, control systems whose controller consists of control software running on a microcontroller device. This motivates investigation on formal model-based design approaches for automatic synthesis of embedded systems control software. We present an algorithm, along with a tool QKS implementing it, that from a formal model (as a discrete-time linear hybrid system) of the controlled system (plant), implementation specifications (that is, number of bits in the Analog-to-Digital, AD, conversion) and system-level formal specifications (that is, safety and liveness requirements for the closed loop system) returns correct-by-construction control software that has a Worst-Case Execution Time (WCET) linear in the number of AD bits and meets the given specifications. We show feasibility of our approach by presenting experimental results on using it to synthesize control software for a buck DC-DC converter, a widely used mixed-mode analog circuit, and for the inverted pendulum.

Journal ArticleDOI
TL;DR: This article proposes a model-based configuration approach that provides automation support for reducing configuration effort and the likelihood of configuration errors in the ICS domain, and shows that it can automatically infer up to 50% of the configuration decisions, and reduces the complexity of making configuration decisions.
Abstract: Configuration in the domain of Integrated Control Systems (ICS) is largely manual, laborious, and error prone. In this article, we propose a model-based configuration approach that provides automation support for reducing configuration effort and the likelihood of configuration errors in the ICS domain. We ground our approach on component-based specifications of ICS families. We then develop a configuration algorithm using constraint satisfaction techniques over finite domains to generate products that are consistent with respect to their ICS family specifications. We reason about the termination and consistency of our configuration algorithm analytically. We evaluate the effectiveness of our configuration approach by applying it to a real subsea oil production system. Specifically, we have rebuilt a number of existing verified product configurations of our industry partner. Our experience shows that our approach can automatically infer up to 50p of the configuration decisions, and reduces the complexity of making configuration decisions.

Journal ArticleDOI
TL;DR: This article defines implementation relation diococ for the centralised approach and proves that dioco and diococ are incomparable, and it is proved that the Oracle problem is NP-complete for diococ and diocos but can be solved in polynomial time if the number of ports is solved.
Abstract: Many systems interact with their environment at distributed interfaces (ports) and sometimes it is not possible to place synchronised local testers at the ports of the system under test (SUT). There are then two main approaches to testing: having independent local testers or a single centralised tester that interacts asynchronously with the SUT. The power of using independent testers has been captured using implementation relation dioco. In this article, we define implementation relation diococ for the centralised approach and prove that dioco and diococ are incomparable. This shows that the frameworks detect different types of faults and so we devise a hybrid framework and define an implementation relation diocos for this. We prove that the hybrid framework is more powerful than the distributed and centralised approaches. We then prove that the Oracle problem is NP-complete for diococ and diocos but can be solved in polynomial time if we place an upper bound on the number of ports. Finally, we consider the problem of deciding whether there is a test case that is guaranteed to force a finite state model into a particular state or to distinguish two states, proving that both problems are undecidable for the centralised and hybrid frameworks.

Journal ArticleDOI
TL;DR: The technique consists of collecting constraints on interfaces, annotations, and reflection, combining them with program constraints collected during dynamic symbolic execution, encoding them in a constraint system, solving them with an off-the-shelf constraint solver, and mapping constraint solutions to test cases and custom mock classes.
Abstract: Automatic test case generation for software programs is very powerful but suffers from a key limitation. That is, most current test case generation techniques fail to cover testee code when covering that code requires additional pieces of code not yet part of the program under test. To address some of these cases, the Pex state-of-the-art test case generator can generate basic mock code. However, current test case generators cannot handle cases in which the code under test uses multiple interfaces, annotations, or reflection. To cover such code in an object-oriented setting, we describe a novel technique for generating test cases and mock classes. The technique consists of collecting constraints on interfaces, annotations, and reflection, combining them with program constraints collected during dynamic symbolic execution, encoding them in a constraint system, solving them with an off-the-shelf constraint solver, and mapping constraint solutions to test cases and custom mock classes. We demonstrate the value of this technique on open-source applications. Our approach covered such third-party code with generated mock classes, while competing approaches failed to cover the code and sometimes produced unintended side-effects such as filling the screen with dialog boxes and writing into the file system.

Journal ArticleDOI
TL;DR: In this paper, the most common pitfalls are well described and categorized in the literature, but it remains a challenge to find the solution to the security risks in web application programmers, even though they are well defined and categorized.
Abstract: Web application programmers must be aware of a wide range of potential security risks. Although the most common pitfalls are well described and categorized in the literature, it remains a challengi...

Journal ArticleDOI
TL;DR: Automated Cookie Collection Testing (CCT) is presented, and is empirically demonstrated to be a low-cost and highly effective automated testing solution for modern Web applications.
Abstract: Cookies are used by over 80p of Web applications utilizing dynamic Web application frameworks. Applications deploying cookies must be rigorously verified to ensure that the application is robust and secure. Given the intense time-to-market pressures faced by modern Web applications, testing strategies that are low cost and automatable are required. Automated Cookie Collection Testing (CCT) is presented, and is empirically demonstrated to be a low-cost and highly effective automated testing solution for modern Web applications. Automatable test oracles and evaluation metrics specifically designed for Web applications are presented, and are shown to be significant diagnostic tests. Automated CCT is shown to detect faults within five real-world Web applications. A case study of over 580 test results for a single application is presented demonstrating that automated CCT is an effective testing strategy. Moreover, CCT is found to detect security bugs in a Web application released into full production.