scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Software Engineering in 2010"


Journal ArticleDOI
TL;DR: DETEX is proposed, a method that embodies and defines all the steps necessary for the specification and detection of code and design smells, and a detection technique that instantiates this method, and an empirical validation in terms of precision and recall of DETEX.
Abstract: Code and design smells are poor solutions to recurring implementation and design problems. They may hinder the evolution of a system by making it hard for software engineers to carry out changes. We propose three contributions to the research field related to code and design smells: (1) DECOR, a method that embodies and defines all the steps necessary for the specification and detection of code and design smells, (2) DETEX, a detection technique that instantiates this method, and (3) an empirical validation in terms of precision and recall of DETEX. The originality of DETEX stems from the ability for software engineers to specify smells at a high level of abstraction using a consistent vocabulary and domain-specific language for automatically generating detection algorithms. Using DETEX, we specify four well-known design smells: the antipatterns Blob, Functional Decomposition, Spaghetti Code, and Swiss Army Knife, and their 15 underlying code smells, and we automatically generate their detection algorithms. We apply and validate the detection algorithms in terms of precision and recall on XERCES v2.7.0, and discuss the precision of these algorithms on 11 open-source systems.

710 citations


Journal ArticleDOI
TL;DR: The intent is to aid future researchers doing empirical studies in SBST by providing an unbiased view of the body of empirical evidence and by guiding them in performing well-designed and executed empirical studies.
Abstract: Metaheuristic search techniques have been extensively used to automate the process of generating test cases, and thus providing solutions for a more cost-effective testing process. This approach to test automation, often coined “Search-based Software Testing” (SBST), has been used for a wide variety of test case generation purposes. Since SBST techniques are heuristic by nature, they must be empirically investigated in terms of how costly and effective they are at reaching their test objectives and whether they scale up to realistic development artifacts. However, approaches to empirically study SBST techniques have shown wide variation in the literature. This paper presents the results of a systematic, comprehensive review that aims at characterizing how empirical studies have been designed to investigate SBST cost-effectiveness and what empirical evidence is available in the literature regarding SBST cost-effectiveness and scalability. We also provide a framework that drives the data collection process of this systematic review and can be the starting point of guidelines on how SBST techniques can be empirically assessed. The intent is to aid future researchers doing empirical studies in SBST by providing an unbiased view of the body of empirical evidence and by guiding them in performing well-designed and executed empirical studies.

446 citations


Journal ArticleDOI
TL;DR: A theoretical exploration of the most widely studied approach, the global search technique embodied by Genetic Algorithms, reveals that cases exist of test data generation problem that suit each algorithm, thereby suggesting that a hybrid global-local search (a Memetic Algorithm) may be appropriate.
Abstract: Search-based optimization techniques have been applied to structural software test data generation since 1992, with a recent upsurge in interest and activity within this area. However, despite the large number of recent studies on the applicability of different search-based optimization approaches, there has been very little theoretical analysis of the types of testing problem for which these techniques are well suited. There are also few empirical studies that present results for larger programs. This paper presents a theoretical exploration of the most widely studied approach, the global search technique embodied by Genetic Algorithms. It also presents results from a large empirical study that compares the behavior of both global and local search-based optimization on real-world programs. The results of this study reveal that cases exist of test data generation problem that suit each algorithm, thereby suggesting that a hybrid global-local search (a Memetic Algorithm) may be appropriate. The paper presents a Memetic Algorithm along with further empirical results studying its performance.

391 citations


Journal ArticleDOI
TL;DR: The CUEZILLA prototype is a tool that measures the quality of new bug reports and recommends which elements should be added to improve the quality, and discusses several recommendations for better bug tracking systems, which should focus on engaging bug reporters, better tool support, and improved handling of bug duplicates.
Abstract: In software development, bug reports provide crucial information to developers. However, these reports widely differ in their quality. We conducted a survey among developers and users of APACHE, ECLIPSE, and MOZILLA to find out what makes a good bug report. The analysis of the 466 responses revealed an information mismatch between what developers need and what users supply. Most developers consider steps to reproduce, stack traces, and test cases as helpful, which are, at the same time, most difficult to provide for users. Such insight is helpful for designing new bug tracking tools that guide users at collecting and providing more helpful information. Our CUEZILLA prototype is such a tool and measures the quality of new bug reports; it also recommends which elements should be added to improve the quality. We trained CUEZILLA on a sample of 289 bug reports, rated by developers as part of the survey. The participants of our survey also provided 175 comments on hurdles in reporting and resolving bugs. Based on these comments, we discuss several recommendations for better bug tracking systems, which should focus on engaging bug reporters, better tool support, and improved handling of bug duplicates.

298 citations


Journal ArticleDOI
TL;DR: An automated readability measure is constructed and shown that it can be 80 percent effective and better than a human, on average, at predicting readability judgments, and correlates strongly with three measures of software quality.
Abstract: In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from 120 human annotators, we derive associations between a simple set of local code features and human notions of readability. Using those features, we construct an automated readability measure and show that it can be 80 percent effective and better than a human, on average, at predicting readability judgments. Furthermore, we show that this metric correlates strongly with three measures of software quality: code changes, automated defect reports, and defect log messages. We measure these correlations on over 2.2 million lines of code, as well as longitudinally, over many releases of selected projects. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggest that comments, in and of themselves, are less important than simple blank lines to local judgments of readability.

284 citations


Journal ArticleDOI
TL;DR: It is concluded that more effort should be spent on investigating other performance-related predictors such as expertise, and task complexity, as well as other promising predictors, such as programming skill and learning.
Abstract: Personality tests in various guises are commonly used in recruitment and career counseling industries. Such tests have also been considered as instruments for predicting the job performance of software professionals both individually and in teams. However, research suggests that other human-related factors such as motivation, general mental ability, expertise, and task complexity also affect the performance in general. This paper reports on a study of the impact of the Big Five personality traits on the performance of pair programmers together with the impact of expertise and task complexity. The study involved 196 software professionals in three countries forming 98 pairs. The analysis consisted of a confirmatory part and an exploratory part. The results show that: (1) Our data do not confirm a meta-analysis-based model of the impact of certain personality traits on performance and (2) personality traits, in general, have modest predictive value on pair programming performance compared with expertise, task complexity, and country. We conclude that more effort should be spent on investigating other performance-related predictors such as expertise, and task complexity, as well as other promising predictors, such as programming skill and learning. We also conclude that effort should be spent on elaborating on the effects of personality on various measures of collaboration, which, in turn, may be used to predict and influence performance. Insights into such malleable, rather than static, factors may then be used to improve pair programming performance.

195 citations


Journal ArticleDOI
TL;DR: A dynamic test generation technique for the domain of dynamic Web applications that utilizes both combined concrete and symbolic execution and explicit-state model checking and minimizes the conditions on the inputs to failing tests so that the resulting bug reports are small and useful in finding and fixing the underlying faults.
Abstract: Web script crashes and malformed dynamically generated webpages are common errors, and they seriously impact the usability of Web applications. Current tools for webpage validation cannot handle the dynamically generated pages that are ubiquitous on today's Internet. We present a dynamic test generation technique for the domain of dynamic Web applications. The technique utilizes both combined concrete and symbolic execution and explicit-state model checking. The technique generates tests automatically, runs the tests capturing logical constraints on inputs, and minimizes the conditions on the inputs to failing tests so that the resulting bug reports are small and useful in finding and fixing the underlying faults. Our tool Apollo implements the technique for the PHP programming language. Apollo generates test inputs for a Web application, monitors the application for crashes, and validates that the output conforms to the HTML specification. This paper presents Apollo's algorithms and implementation, and an experimental evaluation that revealed 673 faults in six PHP Web applications.

173 citations


Journal ArticleDOI
TL;DR: A series of experiments assess the effects of time constraints on the costs and benefits of prioritization techniques and shows that time constraints do play a significant role in determining both the cost-effectiveness of prioritized techniques and the relative cost-benefit trade-offs among techniques.
Abstract: Regression testing is an expensive process used to validate modified software. Test case prioritization techniques improve the cost-effectiveness of regression testing by ordering test cases such that those that are more important are run earlier in the testing process. Many prioritization techniques have been proposed and evidence shows that they can be beneficial. It has been suggested, however, that the time constraints that can be imposed on regression testing by various software development processes can strongly affect the behavior of prioritization techniques. If this is correct, a better understanding of the effects of time constraints could lead to improved prioritization techniques and improved maintenance and testing processes. We therefore conducted a series of experiments to assess the effects of time constraints on the costs and benefits of prioritization techniques. Our first experiment manipulates time constraint levels and shows that time constraints do play a significant role in determining both the cost-effectiveness of prioritization and the relative cost-benefit trade-offs among techniques. Our second experiment replicates the first experiment, controlling for several threats to validity including numbers of faults present, and shows that the results generalize to this wider context. Our third experiment manipulates the number of faults present in programs to examine the effects of faultiness levels on prioritization and shows that faultiness level affects the relative cost-effectiveness of prioritization techniques. Taken together, these results have several implications for test engineers wishing to cost-effectively regression test their software systems. These include suggestions about when and when not to prioritize, what techniques to employ, and how differences in testing processes may relate to prioritization cost--effectiveness.

162 citations


Journal ArticleDOI
TL;DR: This study provides clear guidance to practitioners interested in exploiting their organization's software measurement data repositories for improved software quality modeling.
Abstract: A novel search-based approach to software quality modeling with multiple software project repositories is presented. Training a software quality model with only one software measurement and defect data set may not effectively encapsulate quality trends of the development organization. The inclusion of additional software projects during the training process can provide a cross-project perspective on software quality modeling and prediction. The genetic-programming-based approach includes three strategies for modeling with multiple software projects: Baseline Classifier, Validation Classifier, and Validation-and-Voting Classifier. The latter is shown to provide better generalization and more robust software quality models. This is based on a case study of software metrics and defect data from seven real-world systems. A second case study considers 17 different (nonevolutionary) machine learners for modeling with multiple software data sets. Both case studies use a similar majority-voting approach for predicting fault-proneness class of program modules. It is shown that the total cost of misclassification of the search-based software quality models is consistently lower than those of the non-search-based models. This study provides clear guidance to practitioners interested in exploiting their organization's software measurement data repositories for improved software quality modeling.

151 citations


Journal ArticleDOI
TL;DR: A statistical model is used, derived from the logistic regression, to identify threshold values for the Chidamber and Kemerer (CK) metrics and suggests that there is a relationship between risk levels and object-oriented metrics and that risk levels can be used to identifies threshold effects.
Abstract: Object-oriented metrics have been validated empirically as measures of design complexity. These metrics can be used to mitigate potential problems in the software complexity. However, there are few studies that were conducted to formulate the guidelines, represented as threshold values, to interpret the complexity of the software design using metrics. Classes can be clustered into low and high risk levels using threshold values. In this paper, we use a statistical model, derived from the logistic regression, to identify threshold values for the Chidamber and Kemerer (CK) metrics. The methodology is validated empirically on a large open-source system-the Eclipse project. The empirical results indicate that the CK metrics have threshold effects at various risk levels. We have validated the use of these thresholds on the next release of the Eclipse project-Version 2.1-using decision trees. In addition, the selected threshold values were more accurate than those were selected based on either intuitive perspectives or on data distribution parameters. Furthermore, the proposed model can be exploited to find the risk level for an arbitrary threshold value. These findings suggest that there is a relationship between risk levels and object-oriented metrics and that risk levels can be used to identify threshold effects.

138 citations


Journal ArticleDOI
TL;DR: The feedback-based technique is able to significantly improve existing techniques and helps identify serious problems in the software and the ESI relationships captured via GUI state yield test suites that most often detect more faults than their code, event, and event-interaction-coverage equivalent counterparts.
Abstract: This paper presents a fully automatic model-driven technique to generate test cases for graphical user interfaces (GUIs)-based applications. The technique uses feedback from the execution of a ?seed test suite,? which is generated automatically using an existing structural event interaction graph model of the GUI. During its execution, the runtime effect of each GUI event on all other events pinpoints event semantic interaction (ESI) relationships, which are used to automatically generate new test cases. Two studies on eight applications demonstrate that the feedback-based technique 1) is able to significantly improve existing techniques and helps identify serious problems in the software and 2) the ESI relationships captured via GUI state yield test suites that most often detect more faults than their code, event, and event-interaction-coverage equivalent counterparts.

Journal ArticleDOI
TL;DR: The results suggest that the MOGA can help correct suboptimal class responsibility assignment decisions and perform far better than simpler alternative heuristics such as hill climbing and a single-objective GA.
Abstract: In the context of object-oriented analysis and design (OOAD), class responsibility assignment is not an easy skill to acquire. Though there are many methodologies for assigning responsibilities to classes, they all rely on human judgment and decision making. Our objective is to provide decision-making support to reassign methods and attributes to classes in a class diagram. Our solution is based on a multi-objective genetic algorithm (MOGA) and uses class coupling and cohesion measurement for defining fitness functions. Our MOGA takes as input a class diagram to be optimized and suggests possible improvements to it. The choice of a MOGA stems from the fact that there are typically many evaluation criteria that cannot be easily combined into one objective, and several alternative solutions are acceptable for a given OO domain model. Using a carefully selected case study, this paper investigates the application of our proposed MOGA to the class responsibility assignment problem, in the context of object-oriented analysis and domain class models. Our results suggest that the MOGA can help correct suboptimal class responsibility assignment decisions and perform far better than simpler alternative heuristics such as hill climbing and a single-objective GA.

Journal ArticleDOI
TL;DR: A framework for reputation-aware software service selection and rating is described, and an automated rating model, based on the expectancy-disconfirmation theory from market science, is defined to overcome feedback subjectivity issues.
Abstract: The integration of external software in project development is challenging and risky, notably because the execution quality of the software and the trustworthiness of the software provider may be unknown at integration time. This is a timely problem and of increasing importance with the advent of the SaaS model of service delivery. Therefore, in choosing the SaaS service to utilize, project managers must identify and evaluate the level of risk associated with each candidate. Trust is commonly assessed through reputation systems; however, existing systems rely on ratings provided by consumers. This raises numerous issues involving the subjectivity and unfairness of the service ratings. This paper describes a framework for reputation-aware software service selection and rating. A selection algorithm is devised for service recommendation, providing SaaS consumers with the best possible choices based on quality, cost, and trust. An automated rating model, based on the expectancy-disconfirmation theory from market science, is also defined to overcome feedback subjectivity issues. The proposed rating and selection models are validated through simulations, demonstrating that the system can effectively capture service behavior and recommend the best possible choices.

Journal ArticleDOI
TL;DR: Algorithms for constructing PPDGs and applying them to fault diagnosis and preliminary evidence indicating that a PPDG-based fault localization technique compares favorably with existing techniques are presented.
Abstract: This paper presents an innovative model of a program's internal behavior over a set of test inputs, called the probabilistic program dependence graph (PPDG), which facilitates probabilistic analysis and reasoning about uncertain program behavior, particularly that associated with faults. The PPDG construction augments the structural dependences represented by a program dependence graph with estimates of statistical dependences between node states, which are computed from the test set. The PPDG is based on the established framework of probabilistic graphical models, which are used widely in a variety of applications. This paper presents algorithms for constructing PPDGs and applying them to fault diagnosis. The paper also presents preliminary evidence indicating that a PPDG-based fault localization technique compares favorably with existing techniques. The paper also presents evidence indicating that PPDGs can be useful for fault comprehension.

Journal ArticleDOI
TL;DR: The paper describes the main features of the prototype developed to validate the proposed repair approach for composed Web services and the self-healing architecture for repair handling, and the experimental results are illustrated.
Abstract: This paper proposes a self-healing approach to handle exceptions in service-based processes and to repair the faulty activities with a model-based approach. In particular, a set of repair actions is defined in the process model, and repairability of the process is assessed by analyzing the process structure and the available repair actions. During execution, when an exception arises, repair plans are generated by taking into account constraints posed by the process structure, dependencies among data, and available repair actions. The paper also describes the main features of the prototype developed to validate the proposed repair approach for composed Web services; the self-healing architecture for repair handling and the experimental results are illustrated.

Journal ArticleDOI
TL;DR: This paper describes the exception handling patterns using three process modeling notations: UML 2.0 Activity Diagrams, BPMN, and Little-JIL and discusses the relative merits of the three notations with respect to their ability to represent these patterns.
Abstract: Process modeling allows for analysis and improvement of processes that coordinate multiple people and tools working together to carry out a task. Process modeling typically focuses on the normative process, that is, how the collaboration transpires when everything goes as desired. Unfortunately, real-world processes rarely proceed that smoothly. A more complete analysis of a process requires that the process model also include details about what to do when exceptional situations arise. We have found that, in many cases, there are abstract patterns that capture the relationship between exception handling tasks and the normative process. Just as object-oriented design patterns facilitate the development, documentation, and maintenance of object-oriented programs, we believe that process patterns can facilitate the development, documentation, and maintenance of process models. In this paper, we focus on the exception handling patterns that we have observed over many years of process modeling. We describe these patterns using three process modeling notations: UML 2.0 Activity Diagrams, BPMN, and Little-JIL. We present both the abstract structure of the pattern as well as examples of the pattern in use. We also provide some preliminary statistical survey data to support the claim that these patterns are found commonly in actual use and discuss the relative merits of the three notations with respect to their ability to represent these patterns.

Journal ArticleDOI
TL;DR: This paper defines and applies a new model of adaptive behavior called an Adaptation Finite-State Machine (A-FSM) to enable the detection of faults caused by both erroneous adaptation logic and asynchronous updating of context information, with the latter leading to inconsistencies between the external physical context and its internal representation within an application.
Abstract: Applications running on mobile devices are intensely context-aware and adaptive. Streams of context values continuously drive these applications, making them very powerful but, at the same time, susceptible to undesired configurations. Such configurations are not easily exposed by existing validation techniques, thereby leading to new analysis and testing challenges. In this paper, we address some of these challenges by defining and applying a new model of adaptive behavior called an Adaptation Finite-State Machine (A-FSM) to enable the detection of faults caused by both erroneous adaptation logic and asynchronous updating of context information, with the latter leading to inconsistencies between the external physical context and its internal representation within an application. We identify a number of adaptation fault patterns, each describing a class of faulty behaviors. Finally, we describe three classes of algorithms to detect such faults automatically via analysis of the A-FSM. We evaluate our approach and the trade-offs between the classes of algorithms on a set of synthetically generated Context-Aware Adaptive Applications (CAAAs) and on a simple but realistic application in which a cell phone's configuration profile changes automatically as a result of changes to the user's location, speed, and surrounding environment. Our evaluation describes the faults our algorithms are able to detect and compares the algorithms in terms of their performance and storage requirements.

Journal ArticleDOI
TL;DR: DUALLy is presented, an automated framework that allows architectural languages and tools interoperability given a number of architectural language and tools, they can all interoperate thanks to automated model transformation techniques.
Abstract: Many architectural languages have been proposed in the last 15 years, each one with the chief aim of becoming the ideal language for specifying software architectures. What is evident nowadays, instead, is that architectural languages are defined by stakeholder concerns. Capturing all such concerns within a single, narrowly focused notation is impossible. At the same time, it is also impractical to define and use a "universal" notation, such as UML. As a result, many domain-specific notations for architectural modeling have been proposed, each one focusing on a specific application domain, analysis type, or modeling environment. As a drawback, a proliferation of languages exists, each one with its own specific notation, tools, and domain specificity. No effective interoperability is possible to date. Therefore, if a software architect has to model a concern not supported by his own language/tool, he has to manually transform (and, eventually, keep aligned) the available architectural specification into the required language/tool. This paper presents DUALLy, an automated framework that allows architectural languages and tools interoperability. Given a number of architectural languages and tools, they can all interoperate thanks to automated model transformation techniques. DUALLy is implemented as an Eclipse plugin. Putting it in practice, we apply the DUALLy approach to the Darwin/FSP ADL and to a UML2.0 profile for software architectures. By making use of an industrial complex system, we transform a UML software architecture specification in Darwin/FSP, make some verifications by using LTSA, and reflect changes required by the verifications back to the UML specification.

Journal ArticleDOI
TL;DR: In addressing a well-bounded research question, groups of researchers with similar domain experience can arrive at the same review outcomes, even though they may do so in different ways, providing evidence that the systematic review is a robust research method.
Abstract: BACKGROUND-The systematic review is becoming a more commonly employed research instrument in empirical software engineering. Before undue reliance is placed on the outcomes of such reviews it would seem useful to consider the robustness of the approach in this particular research context. OBJECTIVE-The aim of this study is to assess the reliability of systematic reviews as a research instrument. In particular, we wish to investigate the consistency of process and the stability of outcomes. METHOD-We compare the results of two independent reviews undertaken with a common research question. RESULTS-The two reviews find similar answers to the research question, although the means of arriving at those answers vary. CONCLUSIONS-In addressing a well-bounded research question, groups of researchers with similar domain experience can arrive at the same review outcomes, even though they may do so in different ways. This provides evidence that, in this context at least, the systematic review is a robust research method.

Journal ArticleDOI
TL;DR: This paper develops a novel specification-based approach for efficiently generating tests for products in a software product line by introducing an automatic technique for mapping a formula that specifies a feature into a transformation that defines incremental refinement of test suites.
Abstract: Recent advances in mechanical techniques for systematic testing have increased our ability to automatically find subtle bugs, and hence, to deploy more dependable software. This paper builds on one such systematic technique, scope-bounded testing, to develop a novel specification-based approach for efficiently generating tests for products in a software product line. Given properties of features as first-order logic formulas in Alloy, our approach uses SAT-based analysis to automatically generate test inputs for each product in a product line. To ensure soundness of generation, we introduce an automatic technique for mapping a formula that specifies a feature into a transformation that defines incremental refinement of test suites. Our experimental results using different data structure product lines show that an incremental approach can provide an order of magnitude speedup over conventional techniques. We also present a further optimization using dedicated integer constraint solvers for feature properties that introduce integer constraints, and show how to use a combination of solvers in tandem for solving Alloy formulas.

Journal ArticleDOI
TL;DR: Interactive search-based approaches using evolutionary computation and software agents are investigated in experimental upstream design episodes for two example design domains, showing that interactive evolutionary search, supported by software agents, appears highly promising.
Abstract: Although much evidence exists to suggest that early life cycle software engineering design is a difficult task for software engineers to perform, current computational tool support for software engineers is limited. To address this limitation, interactive search-based approaches using evolutionary computation and software agents are investigated in experimental upstream design episodes for two example design domains. Results show that interactive evolutionary search, supported by software agents, appears highly promising. As an open system, search is steered jointly by designer preferences and software agents. Directly traceable to the design problem domain, a mass of useful and interesting class designs is arrived at which may be visualized by the designer with quantitative measures of structural integrity, such as design coupling and class cohesion. The class designs are found to be of equivalent or better coupling and cohesion when compared to a manual class design for the example design domains, and by exploiting concurrent execution, the runtime performance of the software agents is highly favorable.

Journal ArticleDOI
TL;DR: Results suggest that organizations employing developers with low experience can achieve a significant performance improvement by adopting stereotyped UML diagrams for Web applications, and reduces the gap between subjects with low skill or experience and highly skilled or experienced subjects.
Abstract: In recent years, several design notations have been proposed to model domain-specific applications or reference architectures. In particular, Conallen has proposed the UML Web Application Extension (WAE): a UML extension to model Web applications. The aim of our empirical investigation is to test whether the usage of the Conallen notation supports comprehension and maintenance activities with significant benefits, and whether such benefits depend on developers ability and experience. This paper reports and discusses the results of a series of four experiments performed in different locations and with subjects possessing different experience-namely, undergraduate students, graduate students, and research associates-and different ability levels. The experiments aim at comparing performances of subjects in comprehension tasks where they have the source code complemented either by standard UML diagrams or by diagrams stereotyped using the Conallen notation. Results indicate that, although, in general, it is not possible to observe any significant benefit associated with the usage of stereotyped diagrams, the availability of stereotypes reduces the gap between subjects with low skill or experience and highly skilled or experienced subjects. Results suggest that organizations employing developers with low experience can achieve a significant performance improvement by adopting stereotyped UML diagrams for Web applications.

Journal ArticleDOI
TL;DR: This paper presents a novel comprehensive approach for reverse engineering and performance prediction of components using genetic programming for reconstructing a behavior model from monitoring data, runtime bytecode counts, and static bytecode analysis.
Abstract: In component-based software engineering, existing components are often reused in new applications. Correspondingly, the response time of an entire component-based application can be predicted from the execution durations of individual component services. These execution durations depend on the runtime behavior of a component which itself is influenced by three factors: the execution platform, the usage profile, and the component wiring. To cover all relevant combinations of these influencing factors, conventional prediction of response times requires repeated deployment and measurements of component services for all such combinations, incurring a substantial effort. This paper presents a novel comprehensive approach for reverse engineering and performance prediction of components. In it, genetic programming is utilized for reconstructing a behavior model from monitoring data, runtime bytecode counts, and static bytecode analysis. The resulting behavior model is parameterized over all three performance-influencing factors, which are specified separately. This results in significantly fewer measurements: The behavior model is reconstructed only once per component service, and one application-independent bytecode benchmark run is sufficient to characterize an execution platform. To predict the execution durations for a concrete platform, our approach combines the behavior model with platform-specific benchmarking results. We validate our approach by predicting the performance of a file sharing application.

Journal ArticleDOI
TL;DR: This paper proposes a reliability and testing resources allocation model that is able to provide solutions at various levels of detail, depending upon the information the engineer has about the system, and aims to quantitatively identify the most critical components of software architecture in order to best assign the testing resources to them.
Abstract: With software systems increasingly being employed in critical contexts, assuring high reliability levels for large, complex systems can incur huge verification costs. Existing standards usually assign predefined risk levels to components in the design phase, to provide some guidelines for the verification. It is a rough-grained assignment that does not consider the costs and does not provide sufficient modeling basis to let engineers quantitatively optimize resources usage. Software reliability allocation models partially address such issues, but they usually make so many assumptions on the input parameters that their application is difficult in practice. In this paper, we try to reduce this gap, proposing a reliability and testing resources allocation model that is able to provide solutions at various levels of detail, depending upon the information the engineer has about the system. The model aims to quantitatively identify the most critical components of software architecture in order to best assign the testing resources to them. A tool for the solution of the model is also developed. The model is applied to an empirical case study, a program developed for the European Space Agency, to verify model's prediction abilities and evaluate the impact of the parameter estimation errors on the prediction accuracy.

Journal ArticleDOI
TL;DR: This comparison concentrates on SPMLs most representative of the various alternative approaches, ranging from UML-based framework specializations to full-blown executable metamodeling approaches, and proposes a frame gathering a set of requirements for process modeling.
Abstract: Describing and managing activities, resources, and constraints of software development processes is a challenging goal for many organizations. A first generation of Software Process Modeling Languages (SPMLs) appeared in the 1990s but failed to gain broad industrial support. Recently, however, a second generation of SPMLs has appeared, leveraging the strong industrial interest for modeling languages such as UML. In this paper, we propose a comparison of these UML-based SPMLs. While not exhaustive, this comparison concentrates on SPMLs most representative of the various alternative approaches, ranging from UML-based framework specializations to full-blown executable metamodeling approaches. To support the comparison of these various approaches, we propose a frame gathering a set of requirements for process modeling, such as semantic richness, modularity, executability, conformity to the UML standard, and formality. Beyond discussing the relative merits of these approaches, we also evaluate the overall suitability of these UML-based SPMLs for software process modeling. Finally, we discuss the impact of these approaches on the current state of the practice, and conclude with lessons we have learned in doing this comparison.

Journal ArticleDOI
TL;DR: The mechanisms devised as part of this model fall into two categories: asynchronous event handling and synchronous exception handling, which enable designing recovery actions to handle different kinds of failure conditions arising in context-aware applications.
Abstract: In this paper, we present a forward recovery model for programming robust context-aware applications. The mechanisms devised as part of this model fall into two categories: asynchronous event handling and synchronous exception handling. These mechanisms enable designing recovery actions to handle different kinds of failure conditions arising in context-aware applications. These include service discovery failures, service binding failures, exceptions raised by a service, and context invalidations. This model is integrated in the high-level programming framework that we have designed for building context-aware collaborative (CSCW) applications. In this paper, we demonstrate the capabilities of this model for programming various kinds of recovery patterns in context-aware applications.

Journal ArticleDOI
TL;DR: A benchmark suite is proposed to improve the situation and a community effort to contribute to, and evolve, the benchmark suite, which allows comparing the detection techniques and helps improve the accuracy of detecting design pattern occurrences.
Abstract: Detection of design pattern occurrences is part of several solutions to software engineering problems, and high accuracy of detection is important to help solve the actual problems. The improvement in accuracy of design pattern occurrence detection requires some way of evaluating various approaches. Currently, there are several different methods used in the community to evaluate accuracy. We show that these differences may greatly influence the accuracy results, which makes it nearly impossible to compare the quality of different techniques. We propose a benchmark suite to improve the situation and a community effort to contribute to, and evolve, the benchmark suite. Also, we propose fine-grained metrics assessing the accuracy of various approaches in the benchmark suite. This allows comparing the detection techniques and helps improve the accuracy of detecting design pattern occurrences.

Journal ArticleDOI
TL;DR: A new refinement operator WPα is introduced that uses only the alias information obtained by symbolically executing a test to refine abstractions in a sound manner and is implemented in a tool called YOGI that plugs into Microsoft's Static Driver Verifier framework.
Abstract: We present an algorithm DASH to check if a program P satisfies a safety property φ. The unique feature of this algorithm is that it uses only test generation operations, and it refines and maintains a sound program abstraction as a consequence of failed test generation operations. Thus, each iteration of the algorithm is inexpensive, and can be implemented without any global may-alias information. In particular, we introduce a new refinement operator WPα that uses only the alias information obtained by symbolically executing a test to refine abstractions in a sound manner. We present a full exposition of the DASH algorithm and its theoretical properties. We have implemented DASH in a tool called YOGI that plugs into Microsoft's Static Driver Verifier framework. We have used this framework to run YOGI on 69 Windows Vista drivers with 85 properties and find that YOGI scales much better than SLAM, the current engine driving Microsoft's Static Driver Verifier.

Journal ArticleDOI
TL;DR: This paper applies directed explicit state-space search to discrete and continuous-time Markov chains in order to compute counterexamples for the violation of PCTL or CSL properties.
Abstract: Current stochastic model checkers do not make counterexamples for property violations readily available. In this paper, we apply directed explicit state-space search to discrete and continuous-time Markov chains in order to compute counterexamples for the violation of PCTL or CSL properties. Directed explicit state-space search algorithms explore the state space on-the-fly, which makes our method very efficient and highly scalable. They can also be guided using heuristics which usually improve the performance of the method. Counterexamples provided by our method have two important properties. First, they include those traces which contribute the greatest amount of probability to the property violation. Hence, they show the most probable offending execution scenarios of the system. Second, the obtained counterexamples tend to be small. Hence, they can be effectively analyzed by a human user. Both properties make the counterexamples obtained by our method very useful for debugging purposes. We implemented our method based on the stochastic model checker PRISM and applied it to a number of case studies in order to illustrate its applicability.

Journal ArticleDOI
TL;DR: The methodology that was used in testing the two real-world electronic voting systems evaluated, the findings of the analysis, the attacks, and the lessons learned were described.
Abstract: Voting is the process through which a democratic society determines its government. Therefore, voting systems are as important as other well-known critical systems, such as air traffic control systems or nuclear plant monitors. Unfortunately, voting systems have a history of failures that seems to indicate that their quality is not up to the task. Because of the alarming frequency and impact of the malfunctions of voting systems, in recent years a number of vulnerability analysis exercises have been carried out against voting systems to determine if they can be compromised in order to control the results of an election. We have participated in two such large-scale projects, sponsored by the Secretaries of State of California and Ohio, whose goals were to perform the security testing of the electronic voting systems used in their respective states. As the result of the testing process, we identified major vulnerabilities in all of the systems analyzed. We then took advantage of a combination of these vulnerabilities to generate a series of attacks that would spread across the voting systems and would “steal” votes by combining voting record tampering with social engineering approaches. As a response to the two large-scale security evaluations, the Secretaries of State of California and Ohio recommended changes to improve the security of the voting process. In this paper, we describe the methodology that we used in testing the two real-world electronic voting systems we evaluated, the findings of our analysis, our attacks, and the lessons we learned.