scispace - formally typeset
Search or ask a question

Showing papers on "Test suite published in 2023"


Journal ArticleDOI
TL;DR: In this article , a new test suite and a new optimizer framework are proposed to further promote the research of evolutionary large-scale multiobjective optimization (ELMO) and more realistic features are considered in the new benchmarks, such as mixed formulation of objective functions, mixed linkages in variables, and imbalanced contributions of variables to the objectives.
Abstract: Evolutionary large-scale multiobjective optimization (ELMO) has received increasing attention in recent years. This study has compared various existing optimizers for ELMO on different benchmarks, revealing that both benchmarks and algorithms for ELMO still need significant improvement. Thus, a new test suite and a new optimizer framework are proposed to further promote the research of ELMO. More realistic features are considered in the new benchmarks, such as mixed formulation of objective functions, mixed linkages in variables, and imbalanced contributions of variables to the objectives, which are challenging to the existing optimizers. To better tackle these benchmarks, a variable group-based learning strategy is embedded into the new optimizer framework for ELMO, which significantly improves the quality of reproduction in large-scale search space. The experimental results validate that the designed benchmarks can comprehensively evaluate the performance of existing optimizers for ELMO and the proposed optimizer shows distinct advantages in tackling these benchmarks.

11 citations


Journal ArticleDOI
TL;DR: In this article , two evolutionary approaches to prioritize the test cases using diversity metrics (black-box heuristics) computed on static features of the roads designed to be used within the driving scenarios are introduced.
Abstract: Testing with simulation environments helps to identify critical failing scenarios for self-driving cars (SDCs). Simulation-based tests are safer than in-field operational tests and allow detecting software defects before deployment. However, these tests are very expensive and are too many to be run frequently within limited time constraints. In this article, we investigate test case prioritization techniques to increase the ability to detect SDC regression faults with virtual tests earlier. Our approach, called SDC-Prioritizer , prioritizes virtual tests for SDCs according to static features of the roads we designed to be used within the driving scenarios. These features can be collected without running the tests, which means that they do not require past execution results. We introduce two evolutionary approaches to prioritize the test cases using diversity metrics (black-box heuristics) computed on these static features. These two approaches, called SO-SDC-Prioritizer and MO-SDC-Prioritizer , use single-objective and multi-objective genetic algorithms ( GA ), respectively, to find trade-offs between executing the less expensive tests and the most diverse test cases earlier. Our empirical study conducted in the SDC domain shows that MO-SDC-Prioritizer significantly ( P - value <=0.1 e -10) improves the ability to detect safety-critical failures at the same level of execution time compared to baselines: random and greedy-based test case orderings. Besides, our study indicates that multi-objective meta-heuristics outperform single-objective approaches when prioritizing simulation-based tests for SDCs. MO-SDC-Prioritizer prioritizes test cases with a large improvement in fault detection while its overhead (up to 0.45% of the test execution cost) is negligible.

7 citations


Journal ArticleDOI
TL;DR: In this article , a new metaheuristic algorithm called green anaconda optimization (GAO) which imitates the natural behavior of green anACondas has been designed.
Abstract: A new metaheuristic algorithm called green anaconda optimization (GAO) which imitates the natural behavior of green anacondas has been designed. The fundamental inspiration for GAO is the mechanism of recognizing the position of the female species by the male species during the mating season and the hunting strategy of green anacondas. GAO’s mathematical modeling is presented based on the simulation of these two strategies of green anacondas in two phases of exploration and exploitation. The effectiveness of the proposed GAO approach in solving optimization problems is evaluated on twenty-nine objective functions from the CEC 2017 test suite and the CEC 2019 test suite. The efficiency of GAO in providing solutions for optimization problems is compared with the performance of twelve well-known metaheuristic algorithms. The simulation results show that the proposed GAO approach has a high capability in exploration, exploitation, and creating a balance between them and performs better compared to competitor algorithms. In addition, the implementation of GAO on twenty-one optimization problems from the CEC 2011 test suite indicates the effective capability of the proposed approach in handling real-world applications.

4 citations


Journal ArticleDOI
TL;DR: In this article , the authors demonstrate the applicability of the African Buffalo Optimization approach to test case selection and prioritization, which converges in polynomial time (O(n2).
Abstract: Software needs modifications and requires revisions regularly. Owing to these revisions, retesting software becomes essential to ensure that the enhancements made, have not affected its bug-free functioning. The time and cost incurred in this process, need to be reduced by the method of test case selection and prioritization. It is observed that many nature-inspired techniques are applied in this area. African Buffalo Optimization is one such approach, applied to regression test selection and prioritization. In this paper, the proposed work explains and proves the applicability of the African Buffalo Optimization approach to test case selection and prioritization. The proposed algorithm converges in polynomial time (O(n2)). In this paper, the empirical evaluation of applying African Buffalo Optimization for test case prioritization is done on sample data set with multiple iterations. An astounding 62.5% drop in size and a 48.57% drop in the runtime of the original test suite were recorded. The obtained results are compared with Ant Colony Optimization. The comparative analysis indicates that African Buffalo Optimization and Ant Colony Optimization exhibit similar fault detection capabilities (80%), and a reduction in the overall execution time and size of the resultant test suite. The results and analysis, hence, advocate and encourages the use of African Buffalo Optimization in the area of test case selection and prioritization.

4 citations



Journal ArticleDOI
TL;DR: In this paper , a benchmark test suite for large-scale subset selection from large candidate solution sets, and comparing some representative methods using the proposed test suite, is presented, which provides a baseline for researchers to understand, use, compare, and develop subset selection methods in the EMO field.

3 citations


Journal ArticleDOI
TL;DR: In this paper , a collection of multi-objective real-world problems taken from different disciplines are suggested to complement the performance evaluation of evolutionary algorithms considering real-life applications without the need to be an expert on the concerned disciplines in which they lie.

2 citations


Journal ArticleDOI
TL;DR: In this article , a multi-objective fitness dependent optimizer (MOFDO) was proposed, which is equipped with all five types of knowledge (situational, normative, topographical, domain and historical knowledge) as in FDO.
Abstract: This paper proposes the multi-objective variant of the recently-introduced fitness dependent optimizer (FDO). The algorithm is called a multi-objective fitness dependent optimizer (MOFDO) and is equipped with all five types of knowledge (situational, normative, topographical, domain, and historical knowledge) as in FDO. MOFDO is tested on two standard benchmarks for the performance-proof purpose: classical ZDT test functions, which is a widespread test suite that takes its name from its authors Zitzler, Deb, and Thiele, and on IEEE Congress of Evolutionary Computation benchmark (CEC-2019) multi-modal multi-objective functions. MOFDO results are compared to the latest variant of multi-objective particle swarm optimization, non-dominated sorting genetic algorithm third improvement (NSGA-III), and multi-objective dragonfly algorithm. The comparative study shows the superiority of MOFDO in most cases and comparative results in other cases. Moreover, MOFDO is used for optimizing real-world engineering problems (e.g., welded beam design problems). It is observed that the proposed algorithm successfully provides a wide variety of well-distributed feasible solutions, which enable the decision-makers to have more applicable-comfort choices to consider.

2 citations


Journal ArticleDOI
TL;DR: In this paper , the authors investigate the challenges and typical issues with assessing the specifications of behavioural modelling language semantics, and make recommendations for improving the development of future modelling languages by representing the semantic domain and traces more explicitly, applying diverse test design techniques to obtain conformance test suites, and using various tools to support early-phase language design.
Abstract: Abstract Modelling languages play a central role in developing complex, critical systems. A precise, comprehensible, and high-quality modelling language specification is essential to all stakeholders using, implementing, or extending the language. Many good practices can be found that improve the understandability or consistency of the languages’ semantics. However, designing a modelling language intended for a large audience is still challenging. In this paper, we investigate the challenges and typical issues with assessing the specifications of behavioural modelling language semantics. Our key insight is that the various stakeholder’s understandings of the language’s semantics are often misaligned, and the semantics defined in various artefacts (simulators, test suites) are inconsistent. Therefore assessment of semantics should focus on identifying and resolving these inconsistencies. To illustrate these challenges and techniques, we assessed parts of a state-of-the-art specification for a general-purpose modelling language, the Precise Semantics of UML State Machines (PSSM). We reviewed the text of the specification, analysed and executed PSSM’s conformance test suite, and categorised our experiences according to questions generally relevant to modelling languages. Finally, we made recommendations for improving the development of future modelling languages by representing the semantic domain and traces more explicitly, applying diverse test design techniques to obtain conformance test suites, and using various tools to support early-phase language design.

1 citations


Journal ArticleDOI
TL;DR: In this paper , a systematic review study is conducted that intends to provide an unbiased viewpoint about TSR based on various types of search algorithms, including evolutionary-based, swarm intelligence, human-based and physics-based.
Abstract: Regression testing remains a promising research area for the last few decades. It is a type of testing that aims at ensuring that recent modifications have not adversely affected the software product. After the introduction of a new change in the system under test, the number of test cases significantly increases to handle the modification. Consequently, it becomes prohibitively expensive to execute all of the generated test cases within the allocated testing time and budget. To address this situation, the test suite reduction (TSR) technique is widely used that focusses on finding a representative test suite without compromising its effectiveness such as fault-detection capability. In this work, a systematic review study is conducted that intends to provide an unbiased viewpoint about TSR based on various types of search algorithms. The study's main objective is to examine and classify the current state-of-the-art approaches used in search-based TSR contexts. To achieve this, a systematic review protocol is adopted and, the most relevant primary studies (57 out of 210) published between 2007 and 2022 are selected. Existing search-based TSR approaches are classified into five main categories, including evolutionary-based, swarm intelligence-based, human-based, physics-based, and hybrid, grounded on the type of employed search algorithm. Moreover, the current work reports the parameter settings according to their category, the type of considered operator(s), and the probabilistic rate that significantly impacts on the quality of the obtained solution. Furthermore, this study describes the comparison baseline techniques that support the empirical comparison regarding the cost-effectiveness of a search-based TSR approach. Finally, it isconcluded that search-based TSR has great potential to optimally solve the TSR problem. In this regard, several potential research directions are outlined as useful for future researchers interested in conducting research in the TSR domain.

1 citations


Journal ArticleDOI
TL;DR: Flakify as discussed by the authors is a black-box, language model-based predictor for flaky test cases, which relies exclusively on the source code of test cases and does not require access to production code, or pre-define features.
Abstract: Software testing assures that code changes do not adversely affect existing functionality. However, a test case can be flaky, i.e., passing and failing across executions, even for the same version of the source code. Flaky test cases introduce overhead to software development as they can lead to unnecessary attempts to debug production or testing code. Besides rerunning test cases multiple times, which is time-consuming and computationally expensive, flaky test cases can be predicted using machine learning (ML) models, thus reducing the wasted cost of re-running and debugging these test cases. However, the state-of-the-art ML-based flaky test case predictors rely on pre-defined sets of features that are either project-specific, i.e., inapplicable to other projects, or require access to production code, which is not always available to software test engineers. Moreover, given the non-deterministic behavior of flaky test cases, it can be challenging to determine a complete set of features that could potentially be associated with test flakiness. Therefore, in this article, we propose Flakify, a black-box, language model-based predictor for flaky test cases. Flakify relies exclusively on the source code of test cases, thus not requiring to (a) access to production code (black-box), (b) rerun test cases, (c) pre-define features. To this end, we employed CodeBERT, a pre-trained language model, and fine-tuned it to predict flaky test cases using the source code of test cases. We evaluated Flakify on two publicly available datasets (FlakeFlagger and IDoFT) for flaky test cases and compared our technique with the FlakeFlagger approach, the best state-of-the-art ML-based, white-box predictor for flaky test cases, using two different evaluation procedures: (1) cross-validation and (2) per-project validation, i.e., prediction on new projects. Flakify achieved F1-scores of 79% and 73% on the FlakeFlagger dataset using cross-validation and per-project validation, respectively. Similarly, Flakify achieved F1-scores of 98% and 89% on the IDoFT dataset using the two validation procedures, respectively. Further, Flakify surpassed FlakeFlagger by 10 and 18 percentage points (pp) in terms of precision and recall, respectively, when evaluated on the FlakeFlagger dataset, thus reducing the cost bound to be wasted on unnecessarily debugging test cases and production code by the same percentages (corresponding to reduction rates of 25% and 64%). Flakify also achieved significantly higher prediction results when used to predict test cases on new projects, suggesting better generalizability over FlakeFlagger. Our results further show that a black-box version of FlakeFlagger is not a viable option for predicting flaky test cases.

Journal ArticleDOI
TL;DR: In this article , the authors dealt with different techniques used in test case selection and test case prioritization and the metrics used to evaluate their efficiency by using different techniques of artificial intelligent and describe the best of all.
Abstract: The goal of the testing process is to find errors and defects in the software being developed so that they can be fixed and corrected before they are delivered to the customer. Regression testing is an essential quality testing technique during the maintenance phase of the program as it is performed to ensure the integrity of the program after modifications have been made. With the development of the software, the test suite becomes too large to be fully implemented within the given test cost in terms of budget and time. Therefore, the cost of regression testing using different techniques should be reduced, here we dealt many methods such as retest all technique, regression test selection technique (RTS) and test case prioritization technique (TCP). The efficiency of these techniques is evaluated through the use of many metrics such as average percentage of fault detected (APFD), average percentage block coverage (APBC) and average percentage decision coverage (APDC). In this paper we dealt with these different techniques used in test case selection and test case prioritization and the metrics used to evaluate their efficiency by using different techniques of artificial intelligent and describe the best of all.

Proceedings ArticleDOI
19 Jan 2023
TL;DR: Test Interaction Console (tico) as mentioned in this paper is a toolbox for MATLAB to assist the system developer with authoring and executing large parameterizable test suites, which can be considered as an add-on layer to the core testing framework in MATLAB.
Abstract: We present Test Interaction Console (tico), a toolbox for MATLAB to assist the system developer with authoring and executing large parameterizable test suites. It can be considered as an add-on layer to the core testing framework in MATLAB, where our focus is on ease of use and providing high-level tools to automate recurring tasks in the aerospace domain of model-based design testing. tico is built around a custom command line dispatcher that mimics Bash syntax and has a flexible plugin architecture to extend its capabilities. In the toolbox, we provide enough plugins to connect MATLAB test artifacts with the DevOps platform GitLab and Polarion servers as an example for application life-cycle management (ALM) environments. This integration is the foundation for a complete round-trip solution to validate high-level aircraft requirements defined in Polarion and their test cases executed in MATLAB. Our custom ‘grid’ data container simplifies the parameterization of test cases over flight envelopes with thousands of iterations and encourages a style of dataflow programming. Tag annotations of file artifacts on test cases enable tico to resolve a graph of task dependencies and speed up subsequent tasks by reusing data files across the test suite. All configuration for the toolbox is handled with environment variables, simplifying execution and deployment on virtual platforms, such as Docker.

Posted ContentDOI
20 Jan 2023
TL;DR: ChatGPT as mentioned in this paper offers a dialogue system through which further information, e.g., the expected output for a certain input or an observed error message, can be entered, and its success rate can be further increased, fixing 31 out of 40 bugs.
Abstract: To support software developers in finding and fixing software bugs, several automated program repair techniques have been introduced. Given a test suite, standard methods usually either synthesize a repair, or navigate a search space of software edits to find test-suite passing variants. Recent program repair methods are based on deep learning approaches. One of these novel methods, which is not primarily intended for automated program repair, but is still suitable for it, is ChatGPT. The bug fixing performance of ChatGPT, however, is so far unclear. Therefore, in this paper we evaluate ChatGPT on the standard bug fixing benchmark set, QuixBugs, and compare the performance with the results of several other approaches reported in the literature. We find that ChatGPT's bug fixing performance is competitive to the common deep learning approaches CoCoNut and Codex and notably better than the results reported for the standard program repair approaches. In contrast to previous approaches, ChatGPT offers a dialogue system through which further information, e.g., the expected output for a certain input or an observed error message, can be entered. By providing such hints to ChatGPT, its success rate can be further increased, fixing 31 out of 40 bugs, outperforming state-of-the-art.

Journal ArticleDOI
TL;DR: In this paper , a combination of text embedding, text similarity, and clustering techniques is used to identify similar test cases, which helps to reduce the testing manual effort and time.
Abstract: Software testing is still a manual process in many industries, despite the recent improvements in automated testing techniques. As a result, test cases (which consist of one or more test steps that need to be executed manually by the tester) are often specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the (already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text similarity metrics, and two clustering techniques to cluster similar test steps and three techniques to identify similar test cases from the test step clusters. Through an evaluation in an industrial setting, we showed that our approach achieves a high performance to cluster test steps (an F-score of 87.39%) and identify similar test cases (an F-score of 86.13%). Furthermore, a validation with developers indicates several different practical usages of our approach (such as identifying redundant test cases), which help to reduce the testing manual effort and time.

Journal ArticleDOI
TL;DR: In this article , an inverse reinforcement learning framework with the Q-learning mechanism (IRLMFO) is designed to strengthen the performance of the MFO algorithm in a large-scale real-parameter optimization problem.
Abstract: A reward function is learned from the expert examples by inverse reinforcement learning (IRL), which is more reliable than an artificial method. The moth–flame optimization algorithm (MFO), which is based on the navigation mechanism of a moth flying at night, has been extensively employed to address the complex optimization problem. An inverse reinforcement learning framework with the Q-learning mechanism (IRLMFO) is designed to strengthen the performance of the MFO algorithm in a large-scale real-parameter optimization problem. The right strategy is chosen by the Q-learning mechanism, using historical data provided by the relevant approach in the strategy pool, which stores strategies that include diverse functions. The competition mechanism is designed to strengthen the exploitation capability of the IRLMFO algorithm. The performance of the IRLMFO is verified on the benchmark test suite in CEC 2017. Experimental results illustrate that the IRLMFO outperforms state-of-the-art algorithms.

Proceedings ArticleDOI
15 Mar 2023
TL;DR: In this article , the authors presented an optimization of the existing test case minimization algorithm based on forward propagation of the cause-effect graphing method, which performs test case prioritization based on test case strength.
Abstract: Many different methods are used for generating blackbox test case suites. Test case minimization is used for reducing the feasible test case suite size in order to minimize the cost of testing while ensuring maximum fault detection. This paper presents an optimization of the existing test case minimization algorithm based on forward-propagation of the cause-effect graphing method. The algorithm performs test case prioritization based on test case strength, a newly introduced test case selection metric. The optimized version of the minimization algorithm was evaluated by using thirteen different examples from the available literature. In cases where the existing algorithm did not generate the minimum test case subsets, significant improvements of test effect coverage metric values were achieved. Test effect coverage metric values were not improved only in cases where maximum optimization was already achieved by using the existing algorithm.

Book ChapterDOI
01 Jan 2023
TL;DR: In this article , the authors propose a new multi-objective MDO test suite, based on the popular ZDT biobjective benchmark problems, which is scalable in the number of disciplines and design variables.
Abstract: Multidisciplinary design optimization (MDO) involves solving problems that feature multiple subsystems or disciplines, which is an important characteristic of many complex real-world problems. Whilst a range of single-objective benchmark problems have been proposed for MDO, there exists only a limited selection of multi-objective benchmarks, with only one of these problems being scalable in the number of disciplines. In this paper, we propose a new multi-objective MDO test suite, based on the popular ZDT bi-objective benchmark problems, which is scalable in the number of disciplines and design variables. Dependencies between disciplines can be defined directly in the problem formulation, enabling a diverse set of multidisciplinary topologies to be constructed that can resemble more realistic MDO problems. The new problems are solved using a multidisciplinary feasible architecture which combines a conventional multi-objective optimizer (NSGA-II) with a Newton-based multidisciplinary analysis solver. Empirical findings show that it is possible to solve the proposed ZDT-MDO problems but that multimodal problem landscapes can pose a significant challenge to the optimizer. The proposed test suite can help stimulate more research into the neglected but important topic of multi-objective multidisciplinary optimization.

Journal ArticleDOI
TL;DR: Dichtl et al. as mentioned in this paper proposed several implementation guidelines based on a gate-level model, a design methodology to build a reliable GARO-based TRNG, and an online test to improve the robustness of FIRO/GARO based TRNGs.
Abstract: TRNG is an essential component for security applications. A vulnerable TRNG could be exploited to facilitate potential attacks or be related to a reduced key space, and eventually results in a compromised cryptographic system. A digital FIRO-/GARO-based TRNG with high throughput and high entropy rate was introduced by Jovan Dj. Golic (TC’06). However, the fact that periodic oscillation is a main failure of FIRO-/GARO-based TRNGs is noticed in the paper (Markus Dichtl, ePrint’15). We verify this problem and estimate the consequential entropy loss using Lyapunov exponents and the test suite of the NIST SP 800-90B standard. To address the problem of periodic oscillations, we propose several implementation guidelines based on a gate-level model, a design methodology to build a reliable GARO-based TRNG, and an online test to improve the robustness of FIRO-/GARO-based TRNGs. The gate-level implementation guidelines illustrate the causes of periodic oscillations, which are verified by actual implementation and bifurcation diagram. Based on the design methodology, a suitable feedback polynomial can be selected by evaluating the feedback polynomials. The analysis and understanding of periodic oscillation and FIRO-/GARO-based TRNGs are deepened by delay adjustment. A TRNG with the selected feedback polynomial may occasionally enter periodic oscillations, due to active attacks and the delay inconstancy of implementations. This inconstancy might be caused by self-heating, temperature and voltage fluctuation, and the process variation among different silicon chips. Thus, an online test module, as one indispensable component of TRNGs, is proposed to detect periodic oscillations. The detected periodic oscillation can be eliminated by adjusting feedback polynomial or delays to improve the robustness. The online test module is composed of a lightweight and responsive detector with a high detection rate, outperforming the existing detector design and statistical tests. The areas, power consumptions and frequencies are evaluated based on the ASIC implementations of a GARO, the sampling circuit and the online test module. The gate-level implementation guidelines promote the future establishment of the stochastic model of FIRO-/GARO-based TRNGs with a deeper understanding.

Book ChapterDOI
01 Jan 2023
TL;DR: In this paper , the authors report on the design of a genetic algorithm to tackle the underlying optimization problem in context of an industry project from a software company developing tools for test automation.
Abstract: Time and cost of test execution increases when regression test suites grow over time. Techniques for test suite reduction have been proposed to streamline frequent test execution in continuous integration and to optimize the set of tests without sacrificing coverage and fault detection. In this paper we report on the design of a genetic algorithm to tackle the underlying optimization problem in context of an industry project from a software company developing tools for test automation. The prototypical implementation of the algorithm has been applied to the project’s test suite containing several hundred test cases. We achieved an optimal solution with a 28% reduction of test cases. The evaluation of the reduced test suite using higher-level coverage and mutation analyses showed a minimal loss of coverage. The results demonstrated that the genetic algorithm can be successfully applied in industry and the achieved results are able to satisfy the requirements of the studied project. Nevertheless, major challenges have been identified by applying the approach in industry. They are related to the reliable collection of test execution data from previous test runs and dealing with test suites containing tests exhibiting unpredictable side-effects and flakiness.

Journal ArticleDOI
TL;DR: In this article , the authors propose an approach to evolutionary generation of test suites for multi-path coverage of MPI programs with non-determinism, which can significantly reduce the testing cost and difficulty.
Abstract: When a large number of target paths in a sequential program need to be covered, we can divide similar target paths into the same group, and generate a test suite covering the same group of target paths at the same time, so as to reduce the testing cost. However, different communication edges may be run under a same test input when executing a Message-Passing I nterface (MPI) program with non-determinism, which cause different code fragments may be traversed, indicating the difficulty of generating a test suite to cover each group of target paths. This paper proposes an approach to evolutionary generation of test suites for multi-path coverage of MPI programs with non-determinism, which can significantly reduce the testing cost and difficulty. We first design an indicator for evaluating each traversal set of communication edges, which is used to form a relation matrix between each target path and each traversal set of communication edges, so as to divide all the target paths into a certain amount of groups. Then, we construct an optimization model for test suite generation associated with each group. Finally, an evolutionary optimization algorithm is extended to solve each model, and used to generate a test suite covering each group of target paths. The proposed approach is utilized and compared with several state-of-the-art approaches to seven benchmark MPI programs, as well as the experimental results illustrate that the proposed approach can efficiently generate a test suite, thus supporting the superiority of the proposed approach.

Book ChapterDOI
TL;DR: In this article , an exploratory analysis for both IP2 and IP3, based on eight different ML methods, tested against an exhaustive test suite comprising of seven multi-objective and 32 manyobjective test instances.
Abstract: Recent studies have demonstrated that the performance of Reference vector (RV) based Evolutionary Multi- and Many-objective Optimization algorithms could be improved, through the intervention of Machine Learning (ML) methods. These studies have shown how learning efficient search directions from the intermittent generations’ solutions, could be utilized to create pro-convergence and pro-diversity offspring, leading to better convergence and diversity, respectively. The entailing steps of data-set preparation, training of ML models, and utilization of these models, have been encapsulated as Innovized Progress operators, namely, IP2 (for convergence improvement) and IP3 (for diversity improvement). Evidently, the focus in these studies has been on proof-of-the-concept, and no exploratory analysis has been done to investigate, if and how drastically the operators’ performance may be impacted, if their underlying ML methods (Random Forest for IP2, and kNN for IP3) are varied. This paper seeks to bridge this gap, through an exploratory analysis for both IP2 and IP3, based on eight different ML methods, tested against an exhaustive test suite comprising of seven multi-objective and 32 many-objective test instances. While the results broadly endorse the robustness of the existing IP2 and IP3 operators, they also reveal interesting tradeoffs across different ML methods, in terms of the Hypervolume (HV) metric and corresponding run-time. Notably, within the gambit of the considered test suite and different ML methods adopted, kNN emerges as a winner for both IP2 and IP3, based on conjunct consideration of HV metric and run-time.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a new benchmark framework and design a suite of new test functions with scalable high-dimensional decision space constraints, which are considered to be close to realistic features.
Abstract: Evolutionary constrained multiobjective optimization has received extensive attention and research in the past two decades, and a lot of benchmarks have been proposed to test the effect of the constrained multiobjective evolutionary algorithms (CMOEAs). Specially, the constraint functions are highly correlated with the objective values, which makes the features of constraints too monotonic and differ from the properties of the real-world problems. Accordingly, previous CMOEAs cannot solve real-world problems well, which generally involve decision space constraints with multi-modal/non-linear features. Therefore, we propose a new benchmark framework and design a suite of new test functions with scalable high-dimensional decision space constraints. To be specific, different high-dimensional constraint functions and mixed linkages in variables are considered to be close to realistic features. In this framework, several parameter interfaces are provided, so that users can easily adjust the parameters to obtain the variant functions and test the generalization performance of the algorithms. Different types of existing CMOEAs are employed to test the use of the proposed test functions, and the results show that they are easy to fall into local feasible regions. Therefore, we improve one evolutionary multitasking-based CMOEA to better handle these problems, in which a new search algorithm is designed to enhance the search abilities of populations. Compared with the existing CMOEAs, the proposed CMOEA presents better performance.

Journal ArticleDOI
TL;DR: In this paper , a test case prioritization/selection strategy, namely the Minimal Aggregate of the Diversity of All Groups (MADAG), was proposed to prioritize/select test cases using information on the diversity of the execution trace of each test case.
Abstract: Spectrum-based fault localization (SBFL), which utilizes spectrum information of test cases to calculate the suspiciousness of each statement in a program, can reduce developers’ effort. However, applying redundant test cases from a test suite to fault localization incurs a heavy burden, especially in a restricted resource environment, and it is expensive and infeasible to inspect the results of each test input. Prioritizing/selecting appropriate test cases is important to enable the practical application of the SBFL technique. In addition, we must ensure that applying the selected tests to SBFL can achieve approximately the effectiveness of fault localization with whole tests. This paper presents a test case prioritization/selection strategy, namely the Minimal Aggregate of the Diversity of All Groups (MADAG). The MADAG strategy prioritizes/selects test cases using information on the diversity of the execution trace of each test case. We implemented and applied the MADAG strategy to 233 faulty versions of the Siemens and UNIX programs from the Software-artifact Infrastructure Repository. The experiments show that (1) the MADAG strategy uses only 8.99 and 14.27 test cases, with an average of 18, from the Siemens and UNIX test suites, respectively, and the SBFL technique has approximate effectiveness for fault localization on all test cases and outperforms the previous best test case prioritization method; (2) we verify that applying whole tests from the test suite may not achieve the better effectiveness in fault localization compared with the tests selected by MADAG strategy.

Journal ArticleDOI
TL;DR: In this paper , the test suite construction for GUI software may be executed centered on grey-box approach with the prior test design of window access controls for unit testing, including front-end method of white box and follow-up black box method for integration testing.
Abstract: In this paper, the test suite construction for GUI (Graphical User Interface) software may be executed centered on grey-box approach with the prior test design of window access controls for unit testing, including front-end method of white box and follow-up black box method for integration testing. Moreover, two key opinions are proposed for the test suite construction for GUI software, the first one is that the “Triple-step method” should be used for unit testing with the prior disposing of data boundary value testing of input controls, and another one is that the “Grey-box approach” should be applied in integration testing for GUI software with necessary testing preparation in the precondition. At the same time, the testing of baseline version and the incremental testing should be considered for the test case construction to coordi-nate with the whole evolution of software product today. Additionally, all our opinion and thought are verified and tested with a typical case of GUI software—PQMS (Product Quality Monitoring Software/System), and results in-dicate that these methods and specific disposing are practical and effective.


Posted ContentDOI
19 Jan 2023
TL;DR: PyOED as discussed by the authors is a toolkit for model-constrained OED for inverse problems, which allows researchers to experiment with standard and innovative OED technologies with a wide range of test problems (e.g., simulation models).
Abstract: This paper describes the first version (v1.0) of PyOED, a highly extensible scientific package that enables developing and testing model-constrained optimal experimental design (OED) for inverse problems. Specifically, PyOED aims to be a comprehensive Python toolkit for model-constrained OED. The package targets scientists and researchers interested in understanding the details of OED formulations and approaches. It is also meant to enable researchers to experiment with standard and innovative OED technologies with a wide range of test problems (e.g., simulation models). Thus, PyOED is continuously being expanded with a plethora of Bayesian inversion, DA, and OED methods as well as new scientific simulation models, observation error models, and observation operators. These pieces are added such that they can be permuted to enable testing OED methods in various settings of varying complexities. The PyOED core is completely written in Python and utilizes the inherent object-oriented capabilities; however, the current version of PyOED is meant to be extensible rather than scalable. Specifically, PyOED is developed to ``enable rapid development and benchmarking of OED methods with minimal coding effort and to maximize code reutilization.'' This paper provides a brief description of the PyOED layout and philosophy and provides a set of exemplary test cases and tutorials to demonstrate how the package can be utilized.


Posted ContentDOI
07 Jun 2023
TL;DR: In this article , a set of test data satisfying each criterion is generated, which is used to explore whether one criterion subsumes the other criterion and assess the effectiveness of the test set that was developed for one methodology in terms of the other.
Abstract: Data-Flow and Higher-Order Mutation are white-box testing techniques. To our knowledge, no work has been proposed to compare data flow and higher-order mutation. This paper compares all def-uses data-flow and second-order mutation criteria. This compassion investigates the subsumption relation between these two criteria and evaluates the effectiveness of test data developed for each. To compare the two criteria, a set of test data satisfying each criterion is generated, which is used to explore whether one criterion subsumes the other criterion and assess the effectiveness of the test set that was developed for one methodology in terms of the other. The results showed that the mean mutation coverage ratio of the all du-pairs adequate test cover is 80.9%, and the mean data flow coverage ratio of the 2nd-order mutant adequate test cover is 98.7%. Consequently, 2nd-order mutation “ProbSubsumes” the all du-pairs data flow. The failure detection efficiency of the mutation (98%) is significantly better than the failure detection efficiency of data flow (86%). Consequently, 2nd-order mutation testing is “ProbBetter” than all du-pairs data flow testing. In contrast, the size of the test suite of 2nd-order mutation is more significant than the size of the test suite of all du-pairs.

Proceedings ArticleDOI
01 Mar 2023
TL;DR: In this paper , the authors proposed a dynamic test proportion selection technique DTPS, which incorporates intra-and inter-build cost reduction techniques to save CI cost at the build level.
Abstract: Continuous integration is widely used in modern software engineering. However, it is an expensive practice. The proposed approaches focus on either intra- or inter-build cost reduction. Test case prioritization and selection (TCPS) techniques, typical intra-build techniques, are designed to save the high cost of CI by identifying failed test cases for failed builds. However, existing works are inadequate to distinguish characteristics of builds, but to apply an identical test selection proportion to different builds. Build-prediction techniques, typical inter-build techniques, are designed to save CI cost at the build level. If a build is deemed likely to pass based on the prediction of machine learning models, the whole test suite is skipped for it. Apparently, build in such a manner may miss some realistic failed test cases, if a machine learning model provides a false negative result. In this paper, we propose a dynamic test proportion selection technique DTPS, which incorporates intra- and inter-build cost reduction techniques. DTPS uses build features to construct machine learning models to predict the probability of a specific build failure and transform the probability into the necessary test proportion, with respect to a selected test case prioritization technique. Based on the output of machine learning model, it thus selects a prioritized test suite and a variable proportion of test cases with respect to a build. We constructed a large-scale dataset with approximately 115,000 builds, and conducted a controlled experiment using the dataset. The experiment shows that DTPS outperforms existing techniques significantly. It detects 19.9% to 32.5% more failed test cases, compared with state-of-the-art techniques evaluated in the experiment. At the same time, DTPS performs better than all three existing peer techniques on approximately 47% of projects. Moreover, the experiment also shows that our failure prediction model has an improvement of 0.15 in Area Under Curve (AUC), compared to prior machine learning models.