Proceedings Article•DOI•

R2Fix: Automatically Generating Bug Fixes from Bug Reports

Chen Liu¹, Jinqiu Yang¹, Lin Tan¹, Munawar Hafiz²•Institutions (2)

University of Waterloo¹, Auburn University²

18 Mar 2013-pp 282-291

TL;DR: R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically and could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.

read less

Abstract: Many bugs, even those that are known and documented in bug reports, remain in mature software for a long time due to the lack of the development resources to fix them. We propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically. We evaluate R2Fix on three projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 57 patches correctly, 5 of which are new patches for bugs that have not been fixed by developers yet. We reported all 5 new patches to the developers; 4 have already been accepted and committed to the code repositories. The 57 correct patches generated by R2Fix could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.

...read moreread less

Summary (1 min read)

Jump to: and [8 Conclusions 29]

8 Conclusions 29

The numbers are the total number of fix patterns for each bug type.
AVG denotes that the number in the cell is the average.
To extract the line number for memory leak bugs, R2Fix takes the number after “:” or “at line” in the bug report.
The developers first need to understand this bug report by reading the relevant code together with this report: the buffer state contains only 4 bytes, but 5 bytes, “off \0”, was written to the buffer, where denotes one space character and the single character ‘\0’ is needed to mark the end of the string.
Developers often need to fix more bugs than their time and resources allow [6].

Did you find this useful? Give us your feedback

Figures (9)

Figure 1.1: Converting a bug report to a patch. “-” denotes a line to be deleted; “+” denotes a line to be added; and “ ” is one space character.

Table 8.1: R2Fix Appendix: Fix Patterns. [[]] represents optional part, and [[]]* represents this pattern could appear 0 or more times.“Size” and “len” are often part of a buffer length variable name; once a buffer length variable is identified, the buffer length can be extracted simply by checking right hand side of “=”. To extract the line number for memory leak bugs, R2Fix takes the number after “:” or “at line” in the bug report. Note that the line number is only required for one memory leak subpattern, and is not needed for any other subpatterns.

Table 5.2: Classification Results. Acc is Accuracy; Size is the training set size; and Pos is the number of positive bug reports (bug reports of the corresponding bug type) in the training sets.

Figure 5.1: R2Fix automatically generated the two patches, both of which have been accepted and committed to the code repository by the kernel developers soon after we reported them. The generated patches are directly applicable to the faulty code, but are simplified for illustration.

Table 2.2: Fix Pattern Examples. The numbers are the total number of fix patterns for each bug type.

Table 5.1: Overall Results. AVG denotes that the number in the cell is the average.

Table 4.1: Evaluated software. BR is the total number of bug reports, and LOC is lines of code.

Table 2.1: Common Fix Patterns. L denotes Linux and M denotes Mozilla.

Content maybe subject to copyright Report

R2Fix: Automatically Generating Bug

Fixes from Bug Reports

Chen Liu

A thesis

presented to the University of Waterloo

in fulﬁllment of the

thesis requirement for the degree of

Master of Applied Science

Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2012

 Chen Liu 2012

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including

any required ﬁnal revisions, as accepted by my examiners.

I understand that my thesis may be made electronically available to the public.

Abstract

Many bugs, even those that are known and documented in bug reports, remain in mature

software for a long time due to the lack of the development resources to ﬁx them. We propose

a general approach, R2Fix, to automatically generate bug-ﬁxing patches from free-form bug

reports. R2Fix combines past ﬁx patterns, machine learning techniques, and semantic patch

generation techniques to ﬁx bugs automatically. We evaluate R2Fix on three large and popular

software projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs:

buffer overﬂows, null pointer bugs, and memory leaks. R2Fix generates 60 patches correctly,

5 of which are new patches for bugs that have not been ﬁxed by developers yet. We reported

all 5 new patches to the developers; 4 have already been accepted and committed to the code

repositories. The 60 correct patches generated by R2Fix could have shortened and saved an

average of 68 days of bug diagnosis and patch generation time.

iii

Acknowledgements

I would like to take the opportunity to express my deepest gratitude to my supervisor Prof.

Lin Tan. During the study, she supported me in every aspect. I would like to thank for her

enthusiasm and unwavering support, for the numerous useful guidance, discussions, feedback

and encouragement. The things I learned from her will be extremely beneﬁcial for my future

development.

I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for

spending their valuable time to review the thesis and give valuable comments.

Thanks to our research group members, especially Tian and Jinqiu. We have been in the

group together for more than one year. I enjoyed discussing with them about topics in scientiﬁc

research. Thank them for inspirations and good ideas.

Lastly, and most importantly, I would like to acknowledge my family. My dear mother, the

ﬁrst teacher and the role model in my life, gives me conﬁdence to explore new things, especially

in a different country far away from my homeland. Thanks to her endless support, sacriﬁce and

patience. My dear father comes next. He taught me how to develop interests in a scientiﬁc area.

He taught me how to overcome difﬁculties in study and how to solve problems in daily life. To

them I dedicate the thesis.

Table of Contents

List of Tables viii

List of Figures ix

1 Introduction 1

1.1 Ideal Goal vs. Realistic Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 A Study of Fix Patterns 6

2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Fix Pattern Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 R2Fix Overview 10

3.1 R2Fix Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.1 Bug Classiﬁers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.2 Pattern Parameter Extractor . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.3 Patch Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Bug Classiﬁers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Keyword Search versus Classiﬁcation . . . . . . . . . . . . . . . . . . . 12

3.2.2 Bug Report Parsing and Classiﬁcation . . . . . . . . . . . . . . . . . . . 12

HTML Viewer

Frequently Asked Questions (9)

Q1. What are the contributions mentioned in the paper "R2fix: automatically generating bug fixes from bug reports" ?

The authors propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. The authors evaluate R2Fix on three large and popular software projects, i. e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. The authors reported all 5 new patches to the developers ; 4 have already been accepted and committed to the code repositories.

Q2. What are the future works in "R2fix: automatically generating bug fixes from bug reports" ?

In the future, the authors plan to generate patches for new types of bug reports, and extend R2Fix to take the output of existing bug detection tools as input to improve the effectiveness of patch generation. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs. The authors estimate that it will take you approximately 3 minutes to complete this short survey. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs.

Q3. What is the purpose of the article?

The patch deletes the line that writes 5 bytes to buffer state (denoted by - strcpy(state, "off ");), and adds a new line to write only 4 bytes to state (+ strcpy(state, "off");), which fixes the overflow bug.

Q4. What is the purpose of the bug report?

The developers first need to understand this bug report by reading the relevant code together with this report: the buffer state contains only 4 bytes, but 5 bytes, “off \\0”, was written to the buffer, where denotes one space character and the single character ‘\\0’ is needed to mark the end of the string.

Q5. What is the name of the author?

My dear mother, the first teacher and the role model in my life, gives me confidence to explore new things, especially in a different country far away from my homeland.

Q6. What is the main focus of this paper?

Developers’ bug-fixing process is primarily manual; therefore the time required for producing a fix and its accuracy depend on the skill and experience of individuals.

Q7. What is the purpose of this article?

Developers often spend days, weeks, or even months diagnosing the root cause of a bug by reading the relevant source code, using a debugger to observe and modify the program execution on different inputs, etc.

Q8. What is the main purpose of this article?

After a developer determines the root cause, typically the developer needs to figure out how to modify the buggy code to fix the bug, check out the buggy version of the software, apply the fix, and generate the patch.

Q9. Who is the author of the thesis?

I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for spending their valuable time to review the thesis and give valuable comments.