How high will it be? Using machine learning models to predict branch coverage in automated testing
read more
Citations
Machine Learning Applied to Software Testing: A Systematic Mapping Study
A large scale empirical comparison of state-of-the-art search-based test case generators
Lightweight Assessment of Test-Case Effectiveness Using Source-Code-Quality Indicators
Summarization techniques for code, change, testing, and user feedback (Invited paper)
Branch coverage prediction in automated testing
References
Scikit-learn: Machine Learning in Python
LIBSVM: A library for support vector machines
Scikit-learn: Machine Learning in Python
A metrics suite for object oriented design
Related Papers (5)
Frequently Asked Questions (12)
Q2. What future works have the authors mentioned in the paper "How high will it be? using machine learning models to predict branch coverage in automated testing" ?
They represent the main input for their future work. Future efforts will both involve more sophisticated features, applying a features selection analysis to remove the redundant or irrelevant ones, and investigate different algorithms.
Q3. What is the only parameter to optimize in this case?
The only parameter to optimize in this case is α, a regularization parameter that avoid the rescaling of the epsilon value when the y is under or over a certain factor [34].
Q4. What is the term used to describe the performance of the algorithms?
The authors measured the performances of the employed algorithms in term of Mean Absolute Error (MAE), formally defined as:MAE =∑n i+1 |yi − xi|nwhere y is the predicted value, x are the observed values for the class i and n is the entire set of classes used in the training set.
Q5. What is the MAE for the SVR algorithm with the training set?
the MAE for the SVR algorithm with the training set is of 0.216 and 0.088, for EvoSuite and Randoop respectively, while for the validation set the authors report average MAEs of about 0.291 (+34%) and 0.225 (+155%).
Q6. What are the features that capture the closeness to an optimal package characteristic?
It captures the closeness to an optimal package characteristic when the package is abstract and stable, i.e., A = 1, The author= 0 or concrete and unstable, i.e., A = 0, The author= 1.2) CK and OO Features:
Q7. What is the second set of features used in RQ2?
The second one is a library ofmathematics and statistics operators, while the latter provides helper utilities for Java core classes.
Q8. What was the selected hyper-parameter for EvoSuite?
At the end, the best selected hyper-parameters were α = 0.3715 and a neural network configuration of (5, 219), where the first value is the number of layer and the second one is the number of units per layer, for EvoSuite.
Q9. What is the main input for future efforts?
Future efforts will both involve more sophisticated features, applying a features selection analysis to remove the redundant or irrelevant ones, and investigate different algorithms.
Q10. What kind of features are used to capture the complexity of the CUTs?
To capture the complexity of the CUTs, the authors use different kind of features, i.e., package level features, CK and OO features and Java reserved keywords.
Q11. How many algorithms were used to predict branch coverage?
To have a wide overview of the extend to which a machine learning model might predict the branch coverage achieved test data generation tools, the authors initially experimented 3 different algorithms such as Huber Regression [22], Support Vector Regression [10] and Vector Space Model[28].
Q12. What is the effect of Huber Regression on the fit?
On the contrary, Huber Regression applies only linear loss to such observations, therefore softening the impact on the overall fit.