scispace - formally typeset
Search or ask a question
Author

Dongsun Kim

Bio: Dongsun Kim is an academic researcher from University of Luxembourg. The author has contributed to research in topics: Software system & Software construction. The author has an hindex of 20, co-authored 49 publications receiving 1664 citations. Previous affiliations of Dongsun Kim include Kyungpook National University & Sogang University.


Papers
More filters
Proceedings ArticleDOI
18 May 2013
TL;DR: A novel patch generation approach, Pattern-based Automatic program Repair (Par), using fix patterns learned from existing human-written patches to generate program patches automatically, which is more acceptable than GenProg.
Abstract: Patch generation is an essential software maintenance task because most software systems inevitably have bugs that need to be fixed. Unfortunately, human resources are often insufficient to fix all reported and known bugs. To address this issue, several automated patch generation techniques have been proposed. In particular, a genetic-programming-based patch generation technique, GenProg, proposed by Weimer et al., has shown promising results. However, these techniques can generate nonsensical patches due to the randomness of their mutation operations. To address this limitation, we propose a novel patch generation approach, Pattern-based Automatic program Repair (Par), using fix patterns learned from existing human-written patches. We manually inspected more than 60,000 human-written patches and found there are several common fix patterns. Our approach leverages these fix patterns to generate program patches automatically. We experimentally evaluated Par on 119 real bugs. In addition, a user study involving 89 students and 164 developers confirmed that patches generated by our approach are more acceptable than those generated by GenProg. Par successfully generated patches for 27 out of 119 bugs, while GenProg was successful for only 16 bugs.

549 citations

Journal ArticleDOI
TL;DR: A two-phase prediction model that uses bug reports' contents to suggest the files likely to be fixed and compared it with three other prediction models: the Usual Suspects, the one-phase model, and BugScout to find the best prediction performance.
Abstract: To support developers in debugging and locating bugs, we propose a two-phase prediction model that uses bug reports' contents to suggest the files likely to be fixed. In the first phase, our model checks whether the given bug report contains sufficient information for prediction. If so, the model proceeds to predict files to be fixed, based on the content of the bug report. In other words, our two-phase model "speaks up" only if it is confident of making a suggestion for the given bug report; otherwise, it remains silent. In the evaluation on the Mozilla "Firefox" and "Core" packages, the two-phase model was able to make predictions for almost half of all bug reports; on average, 70 percent of these predictions pointed to the correct files. In addition, we compared the two-phase model with three other prediction models: the Usual Suspects, the one-phase model, and BugScout. The two-phase model manifests the best prediction performance.

179 citations

Proceedings ArticleDOI
TL;DR: It is demonstrated that TBar correctly fixes 43 bugs from Defects4J, an unprecedented performance in the literature (including all approaches, i.e., template-based, stochastic mutation-based or synthesis-based APR).
Abstract: We revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches. Eventually, we find that, assuming a perfect fault localization, TBar correctly/plausibly fixes 74/101 bugs. Replicating a standard and practical pipeline of APR assessment, we demonstrate that TBar correctly fixes 43 bugs from Defects4J, an unprecedented performance in the literature (including all approaches, i.e., template-based, stochastic mutation-based or synthesis-based APR).

122 citations

Proceedings ArticleDOI
27 May 2018
TL;DR: FaCoY is proposed, a novel approach for statically finding code fragments which may be semantically similar to user input code which is more effective than online code-to-code search engines and can be useful in code/patch recommendation.
Abstract: Code search is an unavoidable activity in software development. Various approaches and techniques have been explored in the literature to support code search tasks. Most of these approaches focus on serving user queries provided as natural language free-form input. However, there exists a wide range of use-case scenarios where a code-to-code approach would be most beneficial. For example, research directions in code transplantation, code diversity, patch recommendation can leverage a code-to-code search engine to find essential ingredients for their techniques. In this paper, we propose FaCoY, a novel approach for statically finding code fragments which may be semantically similar to user input code. FaCoY implements a query alternation strategy: instead of directly matching code query tokens with code in the search space, FaCoY first attempts to identify other tokens which may also be relevant in implementing the functional behavior of the input code. With various experiments, we show that (1) FaCoY is more effective than online code-to-code search engines; (2) FaCoY can detect more semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; (3) FaCoY, while static, can detect code fragments which are indeed similar with respect to runtime execution behavior; and (4) FaCoY can be useful in code/patch recommendation.

107 citations

Proceedings ArticleDOI
10 Jul 2019
TL;DR: TBar as mentioned in this paper is a template-based APR tool that systematically applies these fix patterns to program bugs and finds that, assuming a perfect fault localization, TBar correctly/plausibly fixes 74/101 bugs.
Abstract: We revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches. Eventually, we find that, assuming a perfect fault localization, TBar correctly/plausibly fixes 74/101 bugs. Replicating a standard and practical pipeline of APR assessment, we demonstrate that TBar correctly fixes 43 bugs from Defects4J, an unprecedented performance in the literature (including all approaches, i.e., template-based, stochastic mutation-based or synthesis-based APR).

103 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Chapman and Miller as mentioned in this paper, Subset Selection in Regression (Monographs on Statistics and Applied Probability, no. 40, 1990) and Section 5.8.
Abstract: 8. Subset Selection in Regression (Monographs on Statistics and Applied Probability, no. 40). By A. J. Miller. ISBN 0 412 35380 6. Chapman and Hall, London, 1990. 240 pp. £25.00.

1,154 citations

Proceedings ArticleDOI
21 May 2011
TL;DR: It is shown that randomized algorithms are used in a significant percentage of papers but that, in most cases, randomness is not properly accounted for, which casts doubts on the validity of most empirical results assessing randomized algorithms.
Abstract: Randomized algorithms have been used to successfully address many different types of software engineering problems. This type of algorithms employ a degree of randomness as part of their logic. Randomized algorithms are useful for difficult problems where a precise solution cannot be derived in a deterministic way within reasonable time. However, randomized algorithms produce different results on every run when applied to the same problem instance. It is hence important to assess the effectiveness of randomized algorithms by collecting data from a large enough number of runs. The use of rigorous statistical tests is then essential to provide support to the conclusions derived by analyzing such data. In this paper, we provide a systematic review of the use of randomized algorithms in selected software engineering venues in 2009. Its goal is not to perform a complete survey but to get a representative snapshot of current practice in software engineering research. We show that randomized algorithms are used in a significant percentage of papers but that, in most cases, randomness is not properly accounted for. This casts doubts on the validity of most empirical results assessing randomized algorithms. There are numerous statistical tests, based on different assumptions, and it is not always clear when and how to use these tests. We hence provide practical guidelines to support empirical research on randomized algorithms in software engineering

859 citations

Journal ArticleDOI
TL;DR: This paper provides guidelines on how to carry out and properly analyse randomized algorithms applied to solve software engineering tasks, with a particular focus on software testing, which is by far the most frequent application area of randomized algorithms within software engineering.
Abstract: Randomized algorithms are widely used to address many types of software engineering problems, especially in the area of software verification and validation with a strong emphasis on test automation. However, randomized algorithms are affected by chance and so require the use of appropriate statistical tests to be properly analysed in a sound manner. This paper features a systematic review regarding recent publications in 2009 and 2010 showing that, overall, empirical analyses involving randomized algorithms in software engineering tend to not properly account for the random nature of these algorithms. Many of the novel techniques presented clearly appear promising, but the lack of soundness in their empirical evaluations casts unfortunate doubts on their actual usefulness. In software engineering, although there are guidelines on how to carry out empirical analyses involving human subjects, those guidelines are not directly and fully applicable to randomized algorithms. Furthermore, many of the textbooks on statistical analysis are written from the viewpoints of social and natural sciences, which present different challenges from randomized algorithms. To address the questionable overall quality of the empirical analyses reported in the systematic review, this paper provides guidelines on how to carry out and properly analyse randomized algorithms applied to solve software engineering tasks, with a particular focus on software testing, which is by far the most frequent application area of randomized algorithms within software engineering. Copyright © 2012 John Wiley & Sons, Ltd.

510 citations

Proceedings ArticleDOI
11 Jan 2016
TL;DR: Experimental results show that, on a benchmark set of 69 real-world defects drawn from eight open-source projects, Prophet significantly outperforms the previous state-of-the-art patch generation system.
Abstract: We present Prophet, a novel patch generation system that works with a set of successful human patches obtained from open- source software repositories to learn a probabilistic, application-independent model of correct code. It generates a space of candidate patches, uses the model to rank the candidate patches in order of likely correctness, and validates the ranked patches against a suite of test cases to find correct patches. Experimental results show that, on a benchmark set of 69 real-world defects drawn from eight open-source projects, Prophet significantly outperforms the previous state-of-the-art patch generation system.

495 citations

Proceedings ArticleDOI
14 May 2016
TL;DR: Angelix is a novel semantics- based repair method that scales up to programs of similar size as are handled by search-based repair tools such as GenProg and SPR, and is more scalable than previously proposed semantics based repair methods such as SemFix and DirectFix.
Abstract: Since debugging is a time-consuming activity, automated program repair tools such as GenProg have garnered interest A recent study revealed that the majority of GenProg repairs avoid bugs simply by deleting functionality We found that SPR, a state-of-the-art repair tool proposed in 2015, still deletes functionality in their many "plausible" repairs Unlike generate-and-validate systems such as GenProg and SPR, semantic analysis based repair techniques synthesize a repair based on semantic information of the program While such semantics-based repair methods show promise in terms of quality of generated repairs, their scalability has been a concern so far In this paper, we present Angelix, a novel semantics-based repair method that scales up to programs of similar size as are handled by search-based repair tools such as GenProg and SPR This shows that Angelix is more scalable than previously proposed semantics based repair methods such as SemFix and DirectFix Furthermore, our repair method can repair multiple buggy locations that are dependent on each other Such repairs are hard to achieve using SPR and GenProg In our experiments, Angelix generated repairs from large-scale real-world software such as wireshark and php, and these generated repairs include multi-location repairs We also report our experience in automatically repairing the well-known Heartbleed vulnerability

464 citations