scispace - formally typeset
Open AccessProceedings ArticleDOI

R2Fix: Automatically Generating Bug Fixes from Bug Reports

Reads0
Chats0
TLDR
R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically and could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.
Abstract
Many bugs, even those that are known and documented in bug reports, remain in mature software for a long time due to the lack of the development resources to fix them. We propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically. We evaluate R2Fix on three projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 57 patches correctly, 5 of which are new patches for bugs that have not been fixed by developers yet. We reported all 5 new patches to the developers; 4 have already been accepted and committed to the code repositories. The 57 correct patches generated by R2Fix could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.

read more

Content maybe subject to copyright    Report

R2Fix: Automatically Generating Bug
Fixes from Bug Reports
by
Chen Liu
A thesis
presented to the University of Waterloo
in fulfillment of the
thesis requirement for the degree of
Master of Applied Science
in
Electrical and Computer Engineering
Waterloo, Ontario, Canada, 2012
c
Chen Liu 2012

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including
any required final revisions, as accepted by my examiners.
I understand that my thesis may be made electronically available to the public.
ii

Abstract
Many bugs, even those that are known and documented in bug reports, remain in mature
software for a long time due to the lack of the development resources to fix them. We propose
a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug
reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch
generation techniques to fix bugs automatically. We evaluate R2Fix on three large and popular
software projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs:
buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 60 patches correctly,
5 of which are new patches for bugs that have not been fixed by developers yet. We reported
all 5 new patches to the developers; 4 have already been accepted and committed to the code
repositories. The 60 correct patches generated by R2Fix could have shortened and saved an
average of 68 days of bug diagnosis and patch generation time.
iii

Acknowledgements
I would like to take the opportunity to express my deepest gratitude to my supervisor Prof.
Lin Tan. During the study, she supported me in every aspect. I would like to thank for her
enthusiasm and unwavering support, for the numerous useful guidance, discussions, feedback
and encouragement. The things I learned from her will be extremely beneficial for my future
development.
I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for
spending their valuable time to review the thesis and give valuable comments.
Thanks to our research group members, especially Tian and Jinqiu. We have been in the
group together for more than one year. I enjoyed discussing with them about topics in scientific
research. Thank them for inspirations and good ideas.
Lastly, and most importantly, I would like to acknowledge my family. My dear mother, the
first teacher and the role model in my life, gives me confidence to explore new things, especially
in a different country far away from my homeland. Thanks to her endless support, sacrifice and
patience. My dear father comes next. He taught me how to develop interests in a scientific area.
He taught me how to overcome difficulties in study and how to solve problems in daily life. To
them I dedicate the thesis.
iv

Table of Contents
List of Tables viii
List of Figures ix
1 Introduction 1
1.1 Ideal Goal vs. Realistic Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 A Study of Fix Patterns 6
2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Fix Pattern Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 R2Fix Overview 10
3.1 R2Fix Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Bug Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Pattern Parameter Extractor . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Patch Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Bug Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Keyword Search versus Classification . . . . . . . . . . . . . . . . . . . 12
3.2.2 Bug Report Parsing and Classification . . . . . . . . . . . . . . . . . . . 12
v

Citations
More filters
Book ChapterDOI

MUT-APR: MUTation-Based Automated Program Repair Research Tool

TL;DR: MUT-APR is a configurable mutation-based APR tool that varies the APR mechanisms and components to find the best combination for a problem under study andLimitations of the existing framework are discussed.
Journal ArticleDOI

Automated and reliable resource release in device drivers based on dynamic analysis

TL;DR: A novel approach named AutoRR is proposed, which can automatically and reliably release resources based on dynamic analysis, which shows that 40 detected bugs are all successfully and safely tolerated, and the overhead is only 7.84%.

Improving Software Maintenance Ticket Resolution Using Process Mining (Extended Abstract).

Monika Gupta
TL;DR: This thesis addresses a few of the identified challenges pertaining to the software maintenance process by exploring novel applications of process mining and predictive analytics by identifying the potential opportunities for improvement in software process management by mining data repositories.
Journal ArticleDOI

SituRepair: Incorporating machine-learning fault class prediction to inform situational multiple fault automatic program repair

TL;DR: SituRepair as discussed by the authors is a new automatic program repair (APR) technique which is developed to fix multiple-fault (MF) programs, which is intended to discover a solution that fixes multiple faults at a time for a given faulty program.
Journal ArticleDOI

Review4Repair: Code Review Aided Automatic Program Repairing

TL;DR: In this article, the authors investigate the performance improvement of repair techniques using code review comments and train a sequence-to-sequence model on 55,060 code reviews and associated code changes.
References
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Proceedings ArticleDOI

KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

TL;DR: A new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs, and significantly beat the coverage of the developers' own hand-written test suite is presented.
Journal ArticleDOI

Programmers use slices when debugging

TL;DR: The experiment reported here shows that programmers also routinely break programs into one kind of coherent piece which is not coniguous.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "R2fix: automatically generating bug fixes from bug reports" ?

The authors propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. The authors evaluate R2Fix on three large and popular software projects, i. e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. The authors reported all 5 new patches to the developers ; 4 have already been accepted and committed to the code repositories. 

In the future, the authors plan to generate patches for new types of bug reports, and extend R2Fix to take the output of existing bug detection tools as input to improve the effectiveness of patch generation. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs. The authors estimate that it will take you approximately 3 minutes to complete this short survey. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs. 

The patch deletes the line that writes 5 bytes to buffer state (denoted by - strcpy(state, "off ");), and adds a new line to write only 4 bytes to state (+ strcpy(state, "off");), which fixes the overflow bug. 

The developers first need to understand this bug report by reading the relevant code together with this report: the buffer state contains only 4 bytes, but 5 bytes, “off \\0”, was written to the buffer, where denotes one space character and the single character ‘\\0’ is needed to mark the end of the string. 

My dear mother, the first teacher and the role model in my life, gives me confidence to explore new things, especially in a different country far away from my homeland. 

Developers’ bug-fixing process is primarily manual; therefore the time required for producing a fix and its accuracy depend on the skill and experience of individuals. 

Developers often spend days, weeks, or even months diagnosing the root cause of a bug by reading the relevant source code, using a debugger to observe and modify the program execution on different inputs, etc. 

After a developer determines the root cause, typically the developer needs to figure out how to modify the buggy code to fix the bug, check out the buggy version of the software, apply the fix, and generate the patch. 

I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for spending their valuable time to review the thesis and give valuable comments.