scispace - formally typeset
Search or ask a question

Showing papers in "Software Testing, Verification & Reliability in 2015"


Journal ArticleDOI
TL;DR: Results from a controlled experiment show that the use of mutation as a testing technique provides benefits to the fault localization process, and fault localization is significantly improved by using mutation‐based tests instead of block‐based or branch‐based test suites.
Abstract: Fault localization methods seek to identify faulty program statements based on the information provided by the failing and passing test executions. Spectrum-based methods are among the most popular ones and assist programmers by assigning suspiciousness values on program statements according to their probability of being faulty. This paper proposes Metallaxis, a fault localization approach based on mutation analysis. The innovative part of Metallaxis is that it uses mutants and links them with the faulty program places. Thus, mutants that are killed mostly by failing tests provide a good indication about the location of a fault. Experimentation using Metallaxis suggests that it is significantly more effective than statement-based approaches. This is true even in the case where mutation cost-reduction techniques, such as mutant sampling, are facilitated. Additionally, results from a controlled experiment show that the use of mutation as a testing technique provides benefits to the fault localization process. Therefore, fault localization is significantly improved by using mutation-based tests instead of block-based or branch-based test suites. Finally, evidence in support of the methods' scalability is also given. Copyright © 2013 John Wiley & Sons, Ltd.

235 citations


Journal ArticleDOI
TL;DR: This paper presents a case study of coverage‐based regression testing techniques on a real world industrial system with real regression faults, and shows that prioritization techniques that are based on additional coverage with finer grained coverage criteria perform significantly better in fault detection rates.
Abstract: This paper presents a case study of coverage-based regression testing techniques on a real world industrial system with real regression faults. The study evaluates four common prioritization techniques, a test selection technique, a test suite minimization technique and a hybrid approach that combines selection and minimization. The study also examines the effects of using various coverage criteria on the effectiveness of the studied approaches. The results show that prioritization techniques that are based on additional coverage with finer grained coverage criteria perform significantly better in fault detection rates. The study also reveals that using modification information in prioritization techniques does not significantly enhance fault detection rates. The results show that test selection does not provide significant savings in execution cost <2%, which might be attributed to the nature of the changes made to the system. Test suite minimization using finer grained coverage criteria could provide significant savings in execution cost 79.5% while maintaining a fault detection capability level above 70%, thus representing a possible trade-off. The hybrid technique did not provide a significant improvement over traditional minimization techniques. Copyright © 2015John Wiley & Sons, Ltd.

94 citations


Journal ArticleDOI
TL;DR: The effectiveness of this approach is shown and it is discussed how to gain a more expressive test suite by combining cheap but undirected random test case generation with the more expensive but directed mutation‐based technique.
Abstract: This article presents the techniques and results of a novel model-based test case generation approach that automatically derives test cases from UML state machines. The main contribution of this article is the fully automated fault-based test case generation technique together with two empirical case studies derived from industrial use cases. Also, an in-depth evaluation of different fault-based test case generation strategies on each of the case studies is given and a comparison with plain random testing is conducted. The test case generation methodology supports a wide range of UML constructs and is grounded on the formal semantics of Back's action systems and the well-known input-output conformance relation. Mutation operators are employed on the level of the specification to insert faults and generate test cases that will reveal the faults inserted. The effectiveness of this approach is shown and it is discussed how to gain a more expressive test suite by combining cheap but undirected random test case generation with the more expensive but directed mutation-based technique. Finally, an extensive and critical discussion of the lessons learnt is given as well as a future outlook on the general usefulness and practicability of mutation-based test case generation. Copyright © 2014 John Wiley & Sons, Ltd.

83 citations


Journal ArticleDOI
TL;DR: Results of an empirical evaluation indicate the quantitative superiority of MODEP with respect to single‐objective predictors, and withrespect to trivial baseline ranking classes by size in ascending or descending order, and an alternative approach for cross‐project prediction, based on local prediction upon clusters of similar classes.
Abstract: In this paper, we formalize the defect-prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor MODEP, based on multiobjective forms of machine learning techniques-logistic regression and decision trees specifically-trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes or the number of defects that the analysis would likely discover effectiveness, and lines of code to be analysed/tested which can be considered as a proxy of the cost of code inspection. Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes. Copyright © 2015John Wiley & Sons, Ltd.

71 citations


Journal ArticleDOI
TL;DR: The total mutation analysis run time decreased by more than 20% by removing redundant mutants, and the inclusion of redundant mutants led to an overestimated mutation score for all analysed test suites.
Abstract: Mutation analysis is a powerful but computationally expensive method to measure the effectiveness of a testing or debugging technique. The high cost is due, in part, to redundant mutants generated by commonly used mutation operators. A mutant is said to be redundant if its outcome can be predicted based on the outcome of other mutants. The execution of those redundant mutants is unnecessary and wastes CPU resources. Moreover, the inclusion of redundant mutants may lead to a skewed mutant detection rate and therefore misrepresent the effectiveness of the assessed testing or debugging technique. This paper extends previous work and makes the following contributions. First, it defines and provides non-redundant versions of the conditional operator replacement, unary operator insertion, and relational operator replacement mutation operators. Second, it reports on a conducted empirical study using 10 real-world programmes that comprise a total of 410000 lines of code. The empirical study used developer-written and generated test suites. The results show how prevalent redundant mutants are and how their elimination improves the efficiency and accuracy of mutation analysis. In summary, the total mutation analysis run time decreased by more than 20% by removing redundant mutants, and the inclusion of redundant mutants led to an overestimated mutation score for all analysed test suites. Copyright © 2014 John Wiley & Sons, Ltd.

45 citations


Journal ArticleDOI
TL;DR: The evaluation shows that UNICORN can effectively compute and rank the patterns that represent concurrency bugs, and perform computation and ranking with reasonable efficiency.
Abstract: UNICORN is an automated dynamic pattern-detection-based technique that finds and ranks problematic memory access patterns for non-deadlock concurrency bugs. It monitors pairs of memory accesses, combines the pairs into problematic patterns and ranks the patterns by their suspiciousness scores. It detects significant classes of bug types, including order violations and both single-variable and multivariable atomicity violations, which have been shown to be the most important classes of non-deadlock concurrency bugs. This paper describes the UNICORN approach, its implementations in Java and C++, and evaluates these implementations empirically. The evaluation shows that UNICORN can effectively compute and rank the patterns that represent concurrency bugs, and perform computation and ranking with reasonable efficiency. Copyright © 2014 John Wiley & Sons, Ltd.

34 citations


Journal ArticleDOI
TL;DR: This article presents a novel cost analysis framework for concurrent objects by means of a novel form of object‐sensitive recurrence equations that use cost centres in order to keep the resource usage assigned to the different components separate.
Abstract: This article presents a novel cost analysis framework for concurrent objects. Concurrent objects form a well-established model for distributed concurrent systems. In this model, objects are the concurrency units that communicate among them via asynchronous method calls. Cost analysis aims at automatically approximating the resource consumption of executing a program in terms of its input parameters. While cost analysis for sequential programming languages has received considerable attention, concurrency and distribution have been notably less studied. The main challenges of cost analysis in a concurrent setting are as follows. First, inferring precise size abstractions for data in the program in the presence of shared memory. This information is essential for bounding the number of iterations of loops. Second, distribution suggests that analysis must infer the cost of the diverse distributed components separately. We handle this by means of a novel form of object-sensitive recurrence equations that use cost centres in order to keep the resource usage assigned to the different components separate. We have implemented our analysis and evaluated it on several small applications that are classical examples of concurrent and distributed programming. Copyright © 2015John Wiley & Sons, Ltd.

34 citations


Journal ArticleDOI
TL;DR: Overall, the study shows that I‐EQM substantially improves previous methods by retrieving a considerably higher number of killable mutants, thus, amplifying the quality of the testing process.
Abstract: The equivalent mutant problem is a major hindrance to mutation testing. Being undecidable in general, it is only susceptible to partial solutions. In this paper, mutant classification is utilised for isolating likely to be first-order equivalent mutants. A new classification technique, Isolating Equivalent Mutants I-EQM, is introduced and empirically investigated. The proposed approach employs a dynamic execution scheme that integrates the impact on the program execution of first-order mutants with the impact on the output of second-order mutants. An experimental study, conducted using two independently created sets of manually classified mutants selected from real-world programs revalidates previously published results and provides evidence for the effectiveness of the proposed technique. Overall, the study shows that I-EQM substantially improves previous methods by retrieving a considerably higher number of killable mutants, thus, amplifying the quality of the testing process. Copyright © 2014 John Wiley & Sons, Ltd.

33 citations


Journal ArticleDOI
TL;DR: Model transformation traceability in conjunction with a model of mutation operators and a dedicated algorithm allow to automatically or semi‐automatically produce improved test models.
Abstract: A benefit of model-driven engineering relies on the automatic generation of artefacts from high-level models through intermediary levels using model transformations. In such a process, the input must be well designed, and the model transformations should be trustworthy. Because of the specificities of models and transformations, classical software test techniques have to be adapted. Among these techniques, mutation analysis has been ported, and a set of mutation operators has been defined. However, it currently requires considerable manual work and suffers from the test data set improvement activity. This activity is a difficult and time-consuming job and reduces the benefits of the mutation analysis. This paper addresses the test data set improvement activity. Model transformation traceability in conjunction with a model of mutation operators and a dedicated algorithm allow to automatically or semi-automatically produce improved test models. The approach is validated and illustrated in two case studies written in Kermeta.Copyright © 2014 John Wiley & Sons, Ltd.

30 citations


Journal ArticleDOI
Shin Hong1, Moonzoo Kim1
TL;DR: A formal execution model is presented, which can uniformly represent various views of race bug detection techniques on target programme execution and classify race bugs on whether or not a bug violates operation block specification and/or data association specification.
Abstract: As multithreaded programmes become popular to fully utilize multicore CPUs, many race bug detection techniques have been developed to find concurrency errors in multithreaded programmes effectively. Because these techniques have different views on target programme execution and detect race bugs of various types, it is difficult to characterize, compare and improve race bug detection techniques. This paper presents a formal execution model, which can uniformly represent various views of race bug detection techniques on target programme execution. Then, this paper classifies 43 race bug detection techniques according to their target race bugs. We classify race bugs on whether or not a bug violates operation block specification and/or data association specification. This survey provides researchers with a clear top-down view of various race bug detection techniques. In addition, the concrete examples of various race bugs in this survey can help field engineers avoid race bugs in their multithreaded programmes. Copyright © 2014 John Wiley & Sons, Ltd.

27 citations


Journal ArticleDOI
TL;DR: This article summarizes the modelled behaviour of a large number of systems by providing seven specification guidelines to keep state spaces small, and provides an application, generally from the realm of traffic light controllers, for which the good and bad models are both suitable for their purpose but are not behaviourally equivalent.
Abstract: During the last two decades, we modelled the behaviour of a large number of systems. We noted that different styles of modelling had quite an effect on the size of the state spaces of the modelled systems. The differences were so substantial that some specification styles led to far too many states to verify the correctness of the model, whereas with other styles, the number of states was so small that verification was a straightforward activity. In this article, we summarize our experience by providing seven specification guidelines to keep state spaces small. For each guideline, we provide an application, generally from the realm of traffic light controllers, for which we provide a 'bad' model with a large state space, and a 'good' model with a small state space. The good and bad models are both suitable for their purpose but are not behaviourally equivalent. For all guidelines, we discuss circumstances under which it is reasonable to apply the guidelines. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A possible solution to the problem of determining whether a mutant can be killed by a test case or it cannot be killed easily because the mutant is semantically equivalent to the original programme is proposed.
Abstract: Mutation testing is a fault-based software testing technique in which a large number of mutants are generated in order to assess the adequacy of test cases devised. One of the daunting problems in this area consists in determining whether a mutant can be killed by a test case or it cannot be killed easily because the mutant is semantically equivalent to the original programme. A possible solution, as proposed in this paper, is to measure the complexity of each mutant and prioritize them according to how easy or hard they are to be exposed. As a result, using a proper metric based on the mutants' complexity, the tester may decide whether to focus on killing easy or hard mutants first.

Journal ArticleDOI
TL;DR: This work introduces a metric, naturally extended to mutation operators and may be used to reduce the number of mutants, particularly of equivalent mutants, and a firm mutation analysis tool for WS‐BPEL service compositions is presented.
Abstract: Mutation testing is a successful testing technique based on fault injection. However, it can be very costly, and several cost-reduction techniques for reducing the number of mutants have been proposed in the literature. Cost reduction can be aided by an analysis of mutation operators, but this requires the definition of specialized metrics. Several metrics have been proposed before, although their effectiveness and relative merits are not easy to assess. A step ahead in the evaluation of mutation-reduction techniques would be a better metric to determine objectively the quality of a set of mutants with respect to a given test suite. This work introduces such a metric, which is naturally extended to mutation operators and may be used to reduce the number of mutants, particularly of equivalent mutants. Finally, a firm mutation analysis tool for WS-BPEL service compositions is presented, and experimental results obtained by comparing different metrics on several compositions are presented. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The BESTEST approach enables the use of machine learning algorithms to augment standard syntactic testing approaches and shows how search‐based testing techniques can be applied to generate test sets with respect to this criterion.
Abstract: Identifying a finite test set that adequately captures the essential behaviour of a program such that all faults are identified is a well-established problem. This is traditionally addressed with syntactic adequacy metrics e.g.branch coverage, but these can be impractical and may be misleading even if they are satisfied. One intuitive notion of adequacy, which has been discussed in theoretical terms over the past three decades, is the idea of behavioural coverage: If it is possible to infer an accurate model of a system from its test executions, then the test set can be deemed to be adequate. Despite its intuitive basis, it has remained almost entirely in the theoretical domain because inferred models have been expected to be exact generally an infeasible task and have not allowed for any pragmatic interim measures of adequacy to guide test set generation. This paper presents a practical approach to incorporate behavioural coverage. Our BESTEST approach 1enables the use of machine learning algorithms to augment standard syntactic testing approaches and 2shows how search-based testing techniques can be applied to generate test sets with respect to this criterion. An empirical study on a selection of Java units demonstrates that test sets with higher behavioural coverage significantly outperform current baseline test criteria in terms of detected faults. © 2015 The Authors. Software Testing, Verification and Reliability published by John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper empirically analyzes the effect of covering mutants through the mutant schema improved with extra code (MUSIC) technique and concludes that analyzing the covered mutants reduces the execution cost of mutation testing and its application is therefore recommended.
Abstract: Mutation testing is a very effective testing technique that creates mutants in order to design test cases that will kill the mutants. One problem of mutation testing is high costs: creating mutants, executing mutants and calculating the mutation score. This paper empirically analyzes the effect of covering mutants through the mutant schema improved with extra code MUSIC technique. This technique annotates the statements covered by the tests in the original system in order to filter the mutant executions, because tests are only executed against the mutants whose mutated statement is covered by the tests. Therefore, MUSIC is meant to reduce the number of required executions and identify infinite loops at a reduced cost. Besides, an experiment was performed to evaluate the advantages and disadvantages of analyzing the covered mutants. As a result, we conclude that analyzing the covered mutants reduces the execution cost of mutation testing and its application is therefore recommended. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The results of two empirical studies reveal a potential opportunity for creating a more cost‐effective hybrid augmentation approach leveraging both concolic and genetic test case generation techniques, while appropriately utilizing the understanding of the factors that affect them.
Abstract: Test suite augmentation techniques are used in regression testing to identify code elements in a modified program that are not adequately tested and to generate test cases to cover those elements. A defining feature of test suite augmentation techniques is the potential for reusing existing regression test suites. Our preliminary work suggests that several factors influence the efficiency and effectiveness of augmentation techniques that perform such reuse. These include the order in which target code elements are considered while generating test cases, the manner in which existing regression test cases and newly generated test cases are used, and the algorithm used to generate test cases. In this work, we present the results of two empirical studies examining these factors, considering two test case generation algorithms concolic and genetic. The results of our studies show that the primary factor affecting augmentation using these approaches is the test case generation algorithm utilized; this affects both cost and effectiveness. The manner in which existing and newly generated test cases are utilized also has a substantial effect on efficiency and in some cases a substantial effect on effectiveness. The order in which target code elements are considered turns out to have relatively few effects when using concolic test case generation but in some cases influences the efficiency of genetic test case generation. The results of our first study, on four relatively small programs using a large number of test suites, are supported by our second study of a much larger program available in multiple versions. Together, the studies reveal a potential opportunity for creating a more cost-effective hybrid augmentation approach leveraging both concolic and genetic test case generation techniques, while appropriately utilizing our understanding of the factors that affect them. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The results indicate that the metrics are moderate to strong predictors of testing effectiveness and effective at providing test generation targets, and highlight the need for additional work on concurrency coverage metrics.
Abstract: Testing multithreaded programs is inherently challenging, as programs can exhibit numerous thread interactions. To help engineers test these programs cost-effectively, researchers have proposed concurrency coverage metrics. These metrics are intended to be used as predictors for testing effectiveness and provide targets for test generation. The effectiveness of these metrics, however, remains largely unexamined. In this work, we explore the impact of concurrency coverage metrics on testing effectiveness and examine the relationship between coverage, fault detection, and test suite size. We study eight existing concurrency coverage metrics and six new metrics formed by combining complementary metrics. Our results indicate that the metrics are moderate to strong predictors of testing effectiveness and effective at providing test generation targets. Nevertheless, metric effectiveness varies across programs, and even combinations of complementary metrics do not consistently provide effective testing. These results highlight the need for additional work on concurrency coverage metrics. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A formal model of both stateless and stateful firewalls (packet filters), including NAT, is presented to which a specification‐based conformance test case generation approach is applied and a verified optimisation technique for this approach is presented.
Abstract: Firewalls are an important means to secure critical ICT infrastructures. As configurable off-the-shelf products, the effectiveness of a firewall crucially depends on both the correctness of the implementation itself as well as the correct configuration. While testing the implementation can be done once by the manufacturer, the configuration needs to be tested for each application individually. This is particularly challenging as the configuration, implementing a firewall policy, is inherently complex, hard to understand, administrated by different stakeholders and thus difficult to validate. This paper presents a formal model of both stateless and stateful firewalls packet filters, including NAT, to which a specification-based conformance test case generation approach is applied. Furthermore, a verified optimisation technique for this approach is presented: starting from a formal model for stateless firewalls, a collection of semantics-preserving policy transformation rules and an algorithm that optimizes the specification with respect of the number of test cases required for path coverage of the model are derived. We extend an existing approach that integrates verification and testing, that is, tests and proofs to support conformance testing of network policies. The presented approach is supported by a test framework that allows to test actual firewalls using the test cases generated on the basis of the formal model. Finally, a report on several larger case studies is presented. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This article presents a generic approach for the automated detection of faults in variability analysis tools overcoming the oracle problem, and enables the generation of random variability models together with the exact set of valid configurations represented by these models.
Abstract: Variability determines the capability of software applications to be configured and customized. A common need during the development of variability-intensive systems is the automated analysis of their underlying variability models, for example, detecting contradictory configuration options. The analysis operations that are performed on variability models are often very complex, which hinders the testing of the corresponding analysis tools and makes difficult, often infeasible, to determine the correctness of their outputs, that is, the well-known oracle problem in software testing. In this article, we present a generic approach for the automated detection of faults in variability analysis tools overcoming the oracle problem. Our work enables the generation of random variability models together with the exact set of valid configurations represented by these models. These test data are generated from scratch using stepwise transformations and assuring that certain constraints a.k.a. metamorphic relations hold at each step. To show the feasibility and generalizability of our approach, it has been used to automatically test several analysis tools in three variability domains: feature models, common upgradeability description format documents and Boolean formulas. Among other results, we detected 19 real bugs in 7 out of the 15 tools under test. Copyright © 2015 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A methodology and a supporting tool for evaluating the fault detection capability of a NuSMV model advisor are presented, which performs an automatic static model review of Nu SMV models based on the concept of equivalent Kripke structures.
Abstract: Among validation techniques, model review is a static analysis approach that can be performed at the early stages of software development, at the specification level, and aims at determining if a model owns certain quality attributes like completeness, consistency and minimality However, the model review capability to detect behavioural faults has never been measured In this paper, a methodology and a supporting tool for evaluating the fault detection capability of a NuSMV model advisor are presented, which performs an automatic static model review of NuSMV models The approach is based on the use of mutation in a similar way as in mutation testing: several mutation operators for NuSMV models are defined, and the model advisor is used to detect behavioural faults by statically analysing mutated specifications In this way, it is possible to measure the model advisor ability to discover faults To improve the quality of the analysis, the equivalence between a NuSMV model and any of its mutants must be checked To perform this task, this paper proposes a technique based on the concept of equivalent Kripke structures, as NuSMV models are Kripke structures A number of experiments assess the fault-detecting capability, precision and accuracy of the proposed approach Analysis of variance is used to check if the results are statistically significant Some relationships among mutation operators and model quality attributes are also established Copyright © 2014 John Wiley & Sons, Ltd

Journal ArticleDOI
TL;DR: Multiple results achieved recently in the area of noise‐injection‐based testing by the authors are presented in a unified and extended way and a novel use of the genetic algorithm for finding suitable combinations of the many parameters of tests and noise techniques is presented.
Abstract: Testing of concurrent software written in programming languages like Java and C/C++ is ahighly challenging task owing to the many possible interactions among threads. Asimple, cheap, and effective approach that addresses this challenge is testing with noise injection, which influences the scheduling so that different interleavings of concurrent actions are witnessed. In this paper, multiple results achieved recently in the area of noise-injection-based testing by the authors are presented in a unified and extended way. In particular, various concurrency coverage metrics are presented first. Then, multiple heuristics for solving the noise placement problem i.e. where and when to generate noise as well as the noise seeding problem i.e. how to generate the noise are introduced and experimentally evaluated. In addition, several new heuristics are proposed and included into the evaluation too. Recommendations on how to set up noise-based testing for particular scenarios are then given. Finally, anovel use of the genetic algorithm for finding suitable combinations of the many parameters of tests and noise techniques is presented. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A model‐based testing approach, relying on the mutation of HLPSL models to generate abstract test cases to introduce leaks in the security protocols and represent real‐world implementation errors, is presented.
Abstract: In recent years, important efforts have been made for offering a dedicated language for modelling and verifying security protocols. Outcome of the European project AVISPA, the high-level security protocol language HLPSL aims at providing a means for verifying usual security properties such as data secrecy in message exchanges between agents. However, verifying the security protocol model does not guarantee that the actual implementation of the protocol will fulfil these properties. This article presents a model-based testing approach, relying on the mutation of HLPSL models to generate abstract test cases. The proposed mutations aim at introducing leaks in the security protocols and represent real-world implementation errors. The mutated models are then analysed by the automated validation of Internet security protocols and applications tool set, which produces, when the mutant protocol is declared unsafe, counterexample traces exploiting the security flaws and, thus, providing test cases. A dedicated framework is then used to concretize the abstract attack traces, bridging the gap between the formal model level and the implementation level. This model-based testing technique has been experimented on a wide range of security protocols, in order to evaluate the mutation operators. This process has also been fully tool-supported, from the mutation of the HLPSL model to the concretization of the abstract test cases into test scripts. It has been applied to a realistic case study of the Paypal payment protocol, which made it possible to discover a vulnerability in an implementation of an e-commerce framework. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A new composite approach that uses reachability testing to guide the selection of the synchronization sequences tests according to a specific structural testing criterion is presented and empirically evaluated in the context of message‐passing concurrent programs developed with MPI.
Abstract: Testing is a key activity to assure the quality of concurrent applications. In recent years, a variety of different mechanisms have been proposed to test concurrent software. However, a persistent problem is the high testing cost because of the large number of different synchronization sequences that must be tested. When structural testing criteria are adopted, a large number of infeasible synchronization sequences is generated, increasing the testing cost. Although the use of reachability testing reduces the number of infeasible combination because only feasible synchronization sequences are generated, many synchronization combinations are also generated, and this again results in a testing cost with exponential behavior. This paper presents a new composite approach that uses reachability testing to guide the selection of the synchronization sequences tests according to a specific structural testing criterion. This new composite approach is empirically evaluated in the context of message-passing concurrent programs developed with MPI. The experimental study evaluates both the cost and effectiveness of proposed composite approach in comparison with traditional reachability testing and structural testing. The results confirm that the use of the new composite approach has advantages for testing of concurrent applications. Copyright © 2015 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: An extensive empirical study of CHECK‐THEN‐ACT idioms of Java (Oracle Corporation, Redwood, CA, USA) concurrent collections is presented and 60 bugs that were confirmed and fixed by developers are discovered.
Abstract: Concurrent collections are widely used in concurrent programs. However, programmers can misuse these concurrent collections when composing two operations where a check on the collection e.g., collection contains an element precedes an action e.g., inserting an element. Unless the whole composition is atomic, the program contains an atomicity violation bug. This paper presents an extensive empirical study of CHECK-THEN-ACT idioms of Java Oracle Corporation, Redwood, CA, USA concurrent collections. We analyze 28 widely used open-source Java projects comprising 6.4 million lines of code that use Java concurrent collections. We study the correct and incorrect use of idioms and the evolution of the programs with respect to idioms. Our tool, CTADETECTOR, detects and corrects misused idioms. CTADETECTOR discovered 60 bugs that were confirmed and fixed by developers. This shows that CHECK-THEN-ACT idioms are commonly misused in practice, and correcting them is important. Copyright © 2015John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This work devised six reduction techniques and comparatively evaluated them by measuring the following: reduction rate; information loss; impact on two applications of dynamic program analysis, namely, cluster‐based test suite minimization (App‐I), and profile‐based online failure and intrusion detection ( App‐II).
Abstract: The interest in leveraging data mining and statistical techniques to enable dynamic program analysis has increased tremendously in recent years. Researchers have presented numerous techniques that mine and analyze execution profiles to assist software testing and other reliability enhancing approaches. Previous empirical studies have shown that the effectiveness of such techniques is likely to be impacted by the type of profiled program elements. This work further studies the impact of the characteristics of execution profiles by focusing on their size; noting that a typical profile comprises a large number of program elements, in the order of thousands or higher. Specifically, the authors devised six reduction techniques and comparatively evaluated them by measuring the following: 1 reduction rate; 2 information loss; 3 impact on two applications of dynamic program analysis, namely, cluster-based test suite minimization App-I, and profile-based online failure and intrusion detection App-II. The results were promising as the following: a the average reduction rate ranged from 92% to 98%; b three techniques were lossless and three were slightly lossy; c reducing execution profiles exhibited a major positive impact on the effectiveness and efficiency of App-I; and d reduction exhibited a positive impact on the efficiency of App-II, but a minor negative impact on its effectiveness. Copyright © 2014 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This special issue of mutation testing contains nine papers, including four extended versions of papers presented at the 7th International Workshop on Mutation Analysis and five new submissions, which introduces novel techniques to optimize mutant execution, to reduce redundant mutants and to detect equivalent mutants.
Abstract: It is our pleasure to introduce this special issue on Mutation Testing. The special issue contains nine papers, including four extended versions of papers presented at the 7th International Workshop on Mutation Analysis and five new submissions. We have divided the special issue into three broad areas based on the topics covered. The first area focuses on the techniques for making mutation testing more efficient and practical; the second area revisits some fundamental questions about mutants, whilst the third area presents some advanced applications of mutation testing for model-based testing. Mutation Testing has been proven to be an effective way to measure the quality of a test suite in terms of its ability to detect faults [1]. The history of mutation testing can be traced back to 1971 in a publication by Richard Lipton [2] as well as in publications from the late 1970s by DeMillo et al. [3] and Hamlet [4]. In Mutation Testing, faults are deliberately seeded into the original program (by simple syntactic changes) to create a set of faulty programs called mutants, each containing a different syntactic change. The general principle underpinning Mutation Testing is that artificial faults can be used to represent common programming mistakes. By carefully choosing the location within the program and the types of faults, it is possible to simulate any test adequacy criteria whilst providing improved fault detection. A recent survey on mutation testing provides evidence to suggest that the approach is increasing in maturity and practical application [5]. One reason why mutation testing has become a popular testing approach is that it is a straightforward process to apply. To assess the quality of a given test set, the generated mutants are executed against the input test set. If the result of running a mutant is different from the result of running the original program for any test cases in the input test set, the seeded fault denoted by the mutant is detected. One outcome of the Mutation Testing process is the mutation score, which indicates the quality of the input test set. The mutation score is the ratio of the number of detected faults over the total number of seeded faults. Mutation Testing has been widely adopted in the academic community as a means to evaluate software testing techniques [6], as well as to generate tests and test oracles [7, 8]. However, it still suffers from a number of problems that prevent the wider industrial uptake of this effective testing approach. One problem that prevents Mutation Testing from becoming a practical testing technique is the high computational cost of executing a large number of mutants against a test set. Other problems are related to the amount of effort involved in identifying equivalent mutants. Each submission received three reviews from a board of 36 mutation testing experts. For all submissions extended from the mutation workshop, we have recruited at least one new reviewer to ensure wider accessibility to a non-mutation expert testing audience. The first area covers the topic of making mutation testing more efficient and practical. The three papers in this area introduce novel techniques to optimize mutant execution, to reduce redundant mutants and to detect equivalent mutants. In the first paper ‘Reducing Mutation Costs Through Uncovered Mutants’, Pedro Reales Mateo and Macario Polo Usaola propose an improved mutant schema, namely, ‘MUSIC’ to reduce the execution cost for mutation testing. The MUSIC approach records runtime information about structural and mutation coverage; it reduces the execution cost by removing mutant execution tasks, which are not covered by the test cases. The second paper ‘Higher Accuracy and Lower Run Time: Efficient Mutation Analysis using Non-redundant Mutation Operators’ by René Just and Franz Schweiggert attempts to reduce the number of mutants by applying only non-redundant mutation operators. The authors identified a set of operators that tend to not generate any redundant mutants, and their results show that 20% of the runtime cost could be saved using the selected operators. The third paper ‘Employing Second-order Mutation for

Journal ArticleDOI
TL;DR: This issue presents two useful applications of formal modeling, which find that some models make testing and verification much easier than others, and give guidance for designing useful models.
Abstract: This issue presents two useful applications of formal modeling. The first paper, Specification guidelines to avoid the state space explosion problem, by Groote, Kouters, and Osaiweran, give a technique for security testing of firewalls. They formally model firewalls, and then combine testing with a proof technique to verify the correctness of the implemented firewall. (Recommended by Alan Hartman.) The second paper, Formal firewall conformance testing: An application of test and proof techniques, by Brucker, Brügger, and Wolff, is based on years of experience building statebased models. The authors have found that some models make testing and verification much easier than others, and give guidance for designing useful models. (Recommended by Ronald Olsson.)

Journal ArticleDOI
TL;DR: This issue contains three papers that invent new ideas and evaluated them empirically, including a technique for solving the oracle problem when testing tools that analyze the variability of software.
Abstract: This issue contains three papers that invent new ideas and evaluated them empirically. The first paper, Directed test suite augmentation: An empirical investigation, by Xu, Kim, Kim, Cohen, and Rothermel, empirically investigates the effectiveness of strategies for augmenting test sets after changes to software. They compared two different test generation algorithms in two separate studies. (Recommended by Paul Ammann.) The second paper, Reducing execution profiles: Techniques and benefits, by Farjo, Assi, and Masri, presents results of analysis of execution profiles. They invented six ways to reduce the size of execution profiles and empirically measured the effect on the quality of analysis after reducing the profiles. (Recommended by T.Y. Chen.) The third paper, Automated metamorphic testing of variability analysis tools, by Segura, Durán, Sánchez, Le Berre, Lonca, and Ruiz-Cortés, invent a technique for solving the oracle problem when testing tools that analyze the variability of software. (Recommended by T.H. Tse.) Combined, these three papers have a whopping 14 co-authors, which leads in perfectly to the subject of this editorial: determining authorship.