scispace - formally typeset
Open AccessProceedings ArticleDOI

R2Fix: Automatically Generating Bug Fixes from Bug Reports

TLDR
R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically and could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.
Abstract
Many bugs, even those that are known and documented in bug reports, remain in mature software for a long time due to the lack of the development resources to fix them. We propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically. We evaluate R2Fix on three projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 57 patches correctly, 5 of which are new patches for bugs that have not been fixed by developers yet. We reported all 5 new patches to the developers; 4 have already been accepted and committed to the code repositories. The 57 correct patches generated by R2Fix could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.

read more

Content maybe subject to copyright    Report

R2Fix: Automatically Generating Bug
Fixes from Bug Reports
by
Chen Liu
A thesis
presented to the University of Waterloo
in fulfillment of the
thesis requirement for the degree of
Master of Applied Science
in
Electrical and Computer Engineering
Waterloo, Ontario, Canada, 2012
c
Chen Liu 2012

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including
any required final revisions, as accepted by my examiners.
I understand that my thesis may be made electronically available to the public.
ii

Abstract
Many bugs, even those that are known and documented in bug reports, remain in mature
software for a long time due to the lack of the development resources to fix them. We propose
a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug
reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch
generation techniques to fix bugs automatically. We evaluate R2Fix on three large and popular
software projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs:
buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 60 patches correctly,
5 of which are new patches for bugs that have not been fixed by developers yet. We reported
all 5 new patches to the developers; 4 have already been accepted and committed to the code
repositories. The 60 correct patches generated by R2Fix could have shortened and saved an
average of 68 days of bug diagnosis and patch generation time.
iii

Acknowledgements
I would like to take the opportunity to express my deepest gratitude to my supervisor Prof.
Lin Tan. During the study, she supported me in every aspect. I would like to thank for her
enthusiasm and unwavering support, for the numerous useful guidance, discussions, feedback
and encouragement. The things I learned from her will be extremely beneficial for my future
development.
I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for
spending their valuable time to review the thesis and give valuable comments.
Thanks to our research group members, especially Tian and Jinqiu. We have been in the
group together for more than one year. I enjoyed discussing with them about topics in scientific
research. Thank them for inspirations and good ideas.
Lastly, and most importantly, I would like to acknowledge my family. My dear mother, the
first teacher and the role model in my life, gives me confidence to explore new things, especially
in a different country far away from my homeland. Thanks to her endless support, sacrifice and
patience. My dear father comes next. He taught me how to develop interests in a scientific area.
He taught me how to overcome difficulties in study and how to solve problems in daily life. To
them I dedicate the thesis.
iv

Table of Contents
List of Tables viii
List of Figures ix
1 Introduction 1
1.1 Ideal Goal vs. Realistic Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 A Study of Fix Patterns 6
2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Fix Pattern Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 R2Fix Overview 10
3.1 R2Fix Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Bug Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Pattern Parameter Extractor . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Patch Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Bug Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Keyword Search versus Classification . . . . . . . . . . . . . . . . . . . 12
3.2.2 Bug Report Parsing and Classification . . . . . . . . . . . . . . . . . . . 12
v

Citations
More filters
Journal ArticleDOI

Software development in startup companies: A systematic mapping study

TL;DR: The results indicate that software engineering work practices are chosen opportunistically, adapted and configured to provide value under the constrains imposed by the startup context.

Анализ уязвимостей вычислительных систем на основе алгебраических структур и потоков данных National Vulnerability Database

TL;DR: This work applied boolean algebras to develop a mathematical model describing the exploits of the NVD data source when using the classification based on the concept of measurement, and proved that she is a measure from the point of view of measure theory.
Proceedings ArticleDOI

Automatic software repair: a survey

TL;DR: A new class of approaches, namely program repair techniques, whose key idea is to try to automatically repair software systems by producing an actual fix that can be validated by the testers before it is finally accepted, or that is adapted to properly fit the system.
Proceedings ArticleDOI

Shaping program repair space with existing patches and similar code

TL;DR: This paper proposes a novel automatic program repair approach that utilizes both existing patches and similar code and obtains a concrete search space by differencing with similar code snippets and searches within the intersection of the two search spaces.
References
More filters
Proceedings Article

Learning from Mistakes --- A Comprehensive Study on Real World Concurrency Bug Characteristics. In the proceedings of the 13th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS'08), March 2008

TL;DR: In this paper, the authors provide a comprehensive real world concurrency bug characteristic study and provide useful guidance for concurrent bug detection, testing, and concurrent programming language design, which can be used to detect concurrency bugs.
Proceedings ArticleDOI

Learning from mistakes: a comprehensive study on real world concurrency bug characteristics

TL;DR: This study carefully examined concurrency bug patterns, manifestation, and fix strategies of 105 randomly selected real world concurrency bugs from 4 representative server and client open-source applications and reveals several interesting findings that provide useful guidance for concurrency Bug detection, testing, and concurrent programming language design.
Proceedings ArticleDOI

Automatically finding patches using genetic programming

TL;DR: A fully automated method for locating and repairing bugs in software that works on off-the-shelf legacy applications and does not require formal specifications, program annotations or special coding practices is introduced.
Proceedings ArticleDOI

Maintaining mental models: a study of developer work habits

TL;DR: It is found that many problems arose because developers were forced to invest great effort recovering implicit knowledge by exploring code and interrupting teammates and this knowledge was only saved in their memory.
Proceedings ArticleDOI

A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each

TL;DR: This paper evaluates GenProg, which uses genetic programming to repair defects in off-the-shelf C programs, and proposes novel algorithmic improvements that allow it to scale to large programs and find repairs 68% more often.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "R2fix: automatically generating bug fixes from bug reports" ?

The authors propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. The authors evaluate R2Fix on three large and popular software projects, i. e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. The authors reported all 5 new patches to the developers ; 4 have already been accepted and committed to the code repositories. 

In the future, the authors plan to generate patches for new types of bug reports, and extend R2Fix to take the output of existing bug detection tools as input to improve the effectiveness of patch generation. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs. The authors estimate that it will take you approximately 3 minutes to complete this short survey. In other words, R2Fix does not use the “ Comment ” fields of the bug reports, because the authors want to apply R2Fix as soon as a bug is reported to maximize the time and effort that R2Fix can save for developers in fixing bugs. 

The patch deletes the line that writes 5 bytes to buffer state (denoted by - strcpy(state, "off ");), and adds a new line to write only 4 bytes to state (+ strcpy(state, "off");), which fixes the overflow bug. 

The developers first need to understand this bug report by reading the relevant code together with this report: the buffer state contains only 4 bytes, but 5 bytes, “off \\0”, was written to the buffer, where denotes one space character and the single character ‘\\0’ is needed to mark the end of the string. 

My dear mother, the first teacher and the role model in my life, gives me confidence to explore new things, especially in a different country far away from my homeland. 

Developers’ bug-fixing process is primarily manual; therefore the time required for producing a fix and its accuracy depend on the skill and experience of individuals. 

Developers often spend days, weeks, or even months diagnosing the root cause of a bug by reading the relevant source code, using a debugger to observe and modify the program execution on different inputs, etc. 

After a developer determines the root cause, typically the developer needs to figure out how to modify the buggy code to fix the bug, check out the buggy version of the software, apply the fix, and generate the patch. 

I am thankful to readers of the thesis, Prof. Patrick Lam and Prof. Mahesh V. Tripunitara, for spending their valuable time to review the thesis and give valuable comments.