scispace - formally typeset
Search or ask a question
Conference

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 

About: 인도철학 is an academic conference. The conference publishes majorly in the area(s): Computer science & Software..

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
01 Jan 2022
TL;DR: A replication study of the work of Yan et al. shows that a coverage-driven method is less effective than a gradient-based method in terms of both uncovering defects and improving model robustness, and test several hypotheses and further show that even Coverage-driven methods are constrained only to perform differentiable transformations, the uncovered defects still cannot be repaired by adversarial training with gradient- based methods.
Abstract: Deep neural networks (DNN) have been widely applied in modern life, including critical domains like autonomous driving, making it essential to ensure the reliability and robustness of DNN-powered systems. As an analogy to code coverage metrics for testing conventional software, researchers have proposed neuron coverage metrics and coverage-driven methods to generate DNN test cases. However, Yan et al. doubt the usefulness of existing coverage criteria in DNN testing. They show that a coverage-driven method is less effective than a gradient-based method in terms of both uncovering defects and improving model robustness. In this paper, we conduct a replication study of the work by Yan et al. and extend the experiments for deeper analysis. A larger model and a dataset of higher resolution images are included to examine the generalizability of the results. We also extend the experiments with more test case generation techniques and adjust the process of improving model robustness to be closer to the practical life cycle of DNN development. Our experiment results confirm the conclusion from Yan et al. that coverage-driven methods are less effective than gradient-based methods. Yan et al. find that using gradient-based methods to retrain cannot repair defects uncovered by coverage-driven methods. They attribute this to the fact that the two types of methods use different perturbation strategies: gradient-based methods perform differentiable transformations while coverage-driven methods can perform additional non-differentiable transformations. We test several hypotheses and further show that even coverage-driven methods are constrained only to perform differentiable transformations, the uncovered defects still cannot be repaired by adversarial training with gradient-based methods. Thus, defensive strategies for coverage-driven methods should be further studied.

20 citations

Proceedings ArticleDOI
27 Jan 2022
TL;DR: This study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al.
Abstract: APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pre-trained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC.

14 citations

Proceedings ArticleDOI
01 Mar 2022
TL;DR: SDC-Scissor as mentioned in this paper identifies simulation-based tests that are unlikely to detect faults in the self-driving car software under test and skip them before their execution and reduces the number of long-running simulations to execute and drastically increases the cost-effectiveness of simulationbased testing of SDCs software.
Abstract: Simulation platforms facilitate the continuous development of complex systems such as self-driving cars (SDCs). However, previous results on testing SDCs using simulations have shown that most of the automatically generated tests do not strongly contribute to establishing confidence in the quality and reliability of the SDC. Therefore, those tests can be characterized as “uninformative”, and running them generally means wasting precious computational resources. We address this issue with SDC-Scissor, a framework that leverages Machine Learning to identify simulation-based tests that are unlikely to detect faults in the SDC software under test and skip them before their execution. Consequently, by filtering out those tests, SDC-Scissor reduces the number of long-running simulations to execute and drastically increases the cost-effectiveness of simulation-based testing of SDCs software. Our evaluation concerning two large datasets and around 12'000 tests showed that SDC-Scissor achieved a higher classification F1-score (between 47% and 90%) than a randomized baseline in identifying tests that lead to a fault and reduced the time spent running uninformative tests (speedup between 107% and 170%). Webpage & Video: https://github.com/ChristianBirchler/sdc-scissor

13 citations

Proceedings ArticleDOI
20 Feb 2022
TL;DR: A large-scale high-quality corpus from Stack Overflow is constructed, which includes 1,168,257 high- quality question posts for four popular programming languages, and an approach SOTitle is proposed for automatic post title generation by leveraging the code snippets and the problem description in the question post.
Abstract: On Stack Overflow, developers can not only browse question posts to solve their programming problems but also gain expertise from the question posts to help improve their programming skills. Therefore, improving the quality of question posts in Stack Overflow has attracted the wide attention of researchers. A concise and precise title can play an important role in helping developers understand the key information of the question post, which can improve the post quality. How-ever, the quality of the generated title is not high due to the lack of professional knowledge related to their questions or the poor presentation ability of developers. A previous study aimed to automatically generate the title by analyzing the code snippets in the question post. However, this study ignored the useful information in the corresponding problem description. Therefore, we propose an approach SOTitle for automatic post title generation by leveraging the code snippets and the problem description in the question post (i.e., the multi-modal input). SOTitle follows the Transformer structure, which can effectively capture long-term dependencies through a multi-head attention mechanism. To verify the effectiveness of SOTitle, we construct a large-scale high-quality corpus from Stack Overflow, which includes 1,168,257 high-quality question posts for four popular programming languages. Experimental results show that SOTitle can significantly outperform six state-of-the-art baselines in both automatic evaluation and human evaluation. To encourage follow-up studies, we make our corpus and approach publicly available.

12 citations

Proceedings ArticleDOI
20 Feb 2022
TL;DR: This study formalizes automatic shellcode generation and summarization as dual tasks, uses a shallow Transformer for model construction, and design a normalization method Adjust_QKNorm to adapt these low-resource tasks, and proposes a rule-based repair component to improve the performance of automatic shell code generation.
Abstract: A shellcode is a small piece of code and it is executed to exploit a software vulnerability, which allows the target computer to execute arbitrary commands from the attacker through a code injection attack. Similar to the purpose of automated vulnerability generation techniques, the automated generation of shellcode can generate attack instructions, which can be used to detect vulnerabilities and implement defensive measures. While the automated summarization of shellcode can help users unfamiliar with shellcode and network information security understand the intent of shellcode attacks. In this study, we propose a novel approach DualSC to solve the automatic shellcode generation and summarization tasks. Specifically, we formalize automatic shellcode generation and summarization as dual tasks, use a shallow Transformer for model construction, and design a normalization method Adjust_QKNorm to adapt these low-resource tasks (i.e., insufficient training data). Finally, to alleviate the out-of-vocabulary problem, we propose a rule-based repair component to improve the performance of automatic shellcode generation. In our empirical study, we select a high-quality corpus Shellcode_IA32 as our empirical subject. This corpus was gathered from two real-world projects based on the line-by-line granularity. We first compare DualSC with six state-of-the-art baselines from the code generation and code summarization domains in terms of four performance measures. The comparison results show the competitiveness of DualSC. Then, we verify the effectiveness of the component setting in DualSC. Finally, we conduct a human study to further verify the effectiveness of DualSC.

11 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
2022157