scispace - formally typeset
Open AccessProceedings ArticleDOI

R2Fix: Automatically Generating Bug Fixes from Bug Reports

TLDR
R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically and could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.
Abstract
Many bugs, even those that are known and documented in bug reports, remain in mature software for a long time due to the lack of the development resources to fix them. We propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically. We evaluate R2Fix on three projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 57 patches correctly, 5 of which are new patches for bugs that have not been fixed by developers yet. We reported all 5 new patches to the developers; 4 have already been accepted and committed to the code repositories. The 57 correct patches generated by R2Fix could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.

read more

Content maybe subject to copyright    Report

R2Fix: Automatically Generating Bug
Fixes from Bug Reports
by
Chen Liu
A thesis
presented to the University of Waterloo
in fulfillment of the
thesis requirement for the degree of
Master of Applied Science
in
Electrical and Computer Engineering
Waterloo, Ontario, Canada, 2012
c
Chen Liu 2012

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including
any required final revisions, as accepted by my examiners.
I understand that my thesis may be made electronically available to the public.
ii

Abstract
Many bugs, even those that are known and documented in bug reports, remain in mature
software for a long time due to the lack of the development resources to fix them. We propose
a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug
reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch
generation techniques to fix bugs automatically. We evaluate R2Fix on three large and popular
software projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs:
buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 60 patches correctly,
5 of which are new patches for bugs that have not been fixed by developers yet. We reported
all 5 new patches to the developers; 4 have already been accepted and committed to the code
repositories. The 60 correct patches generated by R2Fix could have shortened and saved an
average of 68 days of bug diagnosis and patch generation time.
iii

Acknowledgements
I would like to take the opportunity to express my deepest gratitude to my supervisor Prof.
Lin Tan. During the study, she supported me in every aspect. I would like to thank for her
enthusiasm and unwavering support, for the numerous useful guidance, discussions, feedback
and encouragement. The things I learned from her will be extremely beneficial for my future
development.
I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for
spending their valuable time to review the thesis and give valuable comments.
Thanks to our research group members, especially Tian and Jinqiu. We have been in the
group together for more than one year. I enjoyed discussing with them about topics in scientific
research. Thank them for inspirations and good ideas.
Lastly, and most importantly, I would like to acknowledge my family. My dear mother, the
first teacher and the role model in my life, gives me confidence to explore new things, especially
in a different country far away from my homeland. Thanks to her endless support, sacrifice and
patience. My dear father comes next. He taught me how to develop interests in a scientific area.
He taught me how to overcome difficulties in study and how to solve problems in daily life. To
them I dedicate the thesis.
iv

Table of Contents
List of Tables viii
List of Figures ix
1 Introduction 1
1.1 Ideal Goal vs. Realistic Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 A Study of Fix Patterns 6
2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Fix Pattern Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 R2Fix Overview 10
3.1 R2Fix Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Bug Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Pattern Parameter Extractor . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Patch Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Bug Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Keyword Search versus Classification . . . . . . . . . . . . . . . . . . . 12
3.2.2 Bug Report Parsing and Classification . . . . . . . . . . . . . . . . . . . 12
v

Citations
More filters
Journal ArticleDOI

A Literature Review of Research in Bug Resolution: Tasks, Challenges and Future Directions

TL;DR: This paper investigates the most important phases in bug resolution, including bug understanding, bug triage and bug fixing, and proposes the future research directions of bug resolution.
Posted Content

Learning Quick Fixes from Code Repositories.

TL;DR: REVISAR, a tool for discovering common Java edit patterns in code repositories, is presented and it discovered 89 useful edit patterns that appeared in 3 or more projects, and 64% of the discovered patterns did not appear in existing tools.
Proceedings ArticleDOI

An Empirical Study on Real Bugs for Machine Learning Programs

TL;DR: An empirical study on real machine learning bugs to examine their patterns and how they evolve over time is conducted and shows that there are seven categories of bugs in machine learning programs.
Journal Article

Automated debugging using path-based weakest preconditions

TL;DR: The preliminary experimental results show that the new approach to locate and correct an erroneous statement in a function has potential for development of an automated bug location and correction tool.
Proceedings ArticleDOI

iFixR: Bug Report driven Program Repair

TL;DR: In this paper, the authors investigate a new repair pipeline, iFixR, driven by bug reports: (1) bug reports are fed to an IR-based fault localizer; (2) patches are generated from fix patterns and validated via regression testing; (3) a prioritized list of generated patches is proposed to developers.
References
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Proceedings ArticleDOI

KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

TL;DR: A new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs, and significantly beat the coverage of the developers' own hand-written test suite is presented.
Journal ArticleDOI

Programmers use slices when debugging

TL;DR: The experiment reported here shows that programmers also routinely break programs into one kind of coherent piece which is not coniguous.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "R2fix: automatically generating bug fixes from bug reports" ?

The authors propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. The authors evaluate R2Fix on three large and popular software projects, i. e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. The authors reported all 5 new patches to the developers ; 4 have already been accepted and committed to the code repositories. 

In the future, the authors plan to generate patches for new types of bug reports, and extend R2Fix to take the output of existing bug detection tools as input to improve the effectiveness of patch generation. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs. The authors estimate that it will take you approximately 3 minutes to complete this short survey. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs. 

The patch deletes the line that writes 5 bytes to buffer state (denoted by - strcpy(state, "off ");), and adds a new line to write only 4 bytes to state (+ strcpy(state, "off");), which fixes the overflow bug. 

The developers first need to understand this bug report by reading the relevant code together with this report: the buffer state contains only 4 bytes, but 5 bytes, “off \\0”, was written to the buffer, where denotes one space character and the single character ‘\\0’ is needed to mark the end of the string. 

My dear mother, the first teacher and the role model in my life, gives me confidence to explore new things, especially in a different country far away from my homeland. 

Developers’ bug-fixing process is primarily manual; therefore the time required for producing a fix and its accuracy depend on the skill and experience of individuals. 

Developers often spend days, weeks, or even months diagnosing the root cause of a bug by reading the relevant source code, using a debugger to observe and modify the program execution on different inputs, etc. 

After a developer determines the root cause, typically the developer needs to figure out how to modify the buggy code to fix the bug, check out the buggy version of the software, apply the fix, and generate the patch. 

I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for spending their valuable time to review the thesis and give valuable comments.