scispace - formally typeset
Open AccessJournal ArticleDOI

Using Static Analysis to Find Bugs

Reads0
Chats0
TLDR
FindBugs evaluates what kinds of defects can be effectively detected with relatively simple techniques and helps developers understand how to incorporate such tools into software development.
Abstract
Static analysis examines code in the absence of input data and without running the code. It can detect potential security violations (SQL injection), runtime errors (dereferencing a null pointer) and logical inconsistencies (a conditional test that can't possibly be true). Although a rich body of literature exists on algorithms and analytical frameworks used by such tools, reports describing experiences in industry are much harder to come by. The authors describe FindBugs, an open source static-analysis tool for Java, and experiences using it in production settings. FindBugs evaluates what kinds of defects can be effectively detected with relatively simple techniques and helps developers understand how to incorporate such tools into software development.

read more

Content maybe subject to copyright    Report

focus
22 I E E E S O F T W A R E P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y 0 7 4 0 -7 4 5 9 / 0 8 / $ 2 5 . 0 0 © 2 0 0 8 I E E E
software development tools
Using Static Analysis
to Find Bugs
Nathaniel Ayewah and William Pugh, University of Maryland
David Hovemeyer, York College of Pennsylvania
J. David Morgenthaler and John Penix, Google
FindBugs, an open
source static-analysis
tool for Java,
evaluates what kinds
of defects can be
effectively detected
with relatively
simple techniques.
S
oftware quality is important, but often imperfect in practice. We can use many
techniques to try to improve quality, including testing, code review, and for-
mal specification. Static-analysis tools evaluate software in the abstract, with-
out running the software or considering a specific input. Rather than trying
to prove that the code fulfills its specification, such tools look for violations of reason-
able or recommended programming practice. Thus, they look for places in which code
might dereference a null pointer or overflow an array. Static-analysis tools might also
flag an issue such as a comparison that cant pos-
sibly be true. Although the comparison wont cause
a failure or exception, its existence suggests that it
might have resulted from a coding error, leading to
incorrect program behavior.
Some tools also flag or enforce programming
style issues, such as naming conventions or the use
of curly braces in conditionals and looping struc-
tures. The lint program for C programs
1
is gener-
ally considered the first widely used static-analysis
tool for defect detection, although by today’s stan-
dards it’s rather limited. Researchers have done
significant work in the area over the past decade,
driven substantially by concerns over defects that
lead to security vulnerabilities, such as buffer over-
flows, format string vulnerabilities, SQL injection,
and cross-site scripting. A vibrant commercial in-
dustry has developed around advanced (and expen-
sive) static-analysis tools,
2,3
and several companies
have their own proprietary in-house tools, such as
Microsoft’s PREfix.
4
Many commercial tools are
sophisticated, using deep analysis techniques. Some
can use or depend on annotations that describe in-
variants and other intended software properties
that tools cant easily infer, such as the intended re-
lationship between function parameters.
FindBugs is an example of a static-analysis tool
that looks for coding defects.
5–7
The FindBugs
project began as an observation, developed into an
experiment, and snowballed into a widely used tool
with more than half a million downloads world-
wide. The observation that started it all was that
some Java programs contained blatant mistakes
that were detectable with fairly trivial analysis
techniques. Initial experiments showed that even
production qualitysoftware contained such mis-
takes and that even experienced developers made
them. FindBugs has grown, paying careful atten-
tion to mistakes that occur in practice and to the
techniques and features needed to effectively incor-
porate it into production software development.
Here, we review the types of issues FindBugs
identifies, discuss the techniques it uses, and look
at some experiences using FindBugs on Suns Java
Development Kit (JDK) and Googles Java code
base.
Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

September/October 2008 I E E E S O F T W A R E
23
FindBugs in practice
In its current form, FindBugs recognizes more than
300 programming mistakes and dubious coding id-
ioms that it can identify using simple analysis tech-
niques. FindBugs also uses more sophisticated anal-
ysis techniques, devised to help effectively identify
certain issues—such as dereferencing of null point-
ers—that occur frequently enough to warrant their
development. Unlike some other tools designed to
provide security guarantees, FindBugs doesnt try
to identify all defects in a particular category or
prove that software doesnt contain a particular de-
fect. Rather, it’s designed to effectively identify low-
hanging fruit—to cheaply detect defects we believe
developers will want to review and remedy.
Many developers use FindBugs ad hoc, and a
growing number of projects and companies are
integrating it into their standard build and testing
systems. Google has incorporated FindBugs into its
standard testing and code-review process and has
fixed more than 1,000 issues in its internal code
base that FindBugs has identified.
Defects in real code
To appreciate static analysis for defect detection
in general, and FindBugs in particular, it helps to
be familiar with some sample defects found in real
code. Let’s look at some examples from Suns JDK
1.6.0 implementation, which also are representative
of code seen elsewhere.
One unexpectedly common defect is the infi-
nite recursive loop—that is, a function that always
returns the result of invoking itself. We originally
extended FindBugs to look for this defect because
some freshman at the University of Maryland
had trouble understanding how Java constructors
worked. When we ran it against build 13 of Suns
JDK 1.6, we found five infinite recursive loops,
including
public String foundType() {
return this.foundType();
}
This code should have been a getter method for
the field foundType, but the extra parenthesis means
it always recursively calls itself until the stack over-
flows. Various mistakes lead to infinite recursive
loops, but the same simple techniques can detect
them all. Google has found and fixed more than 70
infinite recursive loops in their code base, and they
occur surprisingly frequently in other code bases
we’ve examined.
Another common bug pattern is when software
invokes a method but ignores its return value, de-
spite the fact that doing so makes no sense. An ex-
ample is the statement s.toLowerCase(), where s is a String.
Because Strings in Java are immutable, the toLowerCase()
method has no effect on the String its invoked on,
but rather returns a new String. The developer prob-
ably intended to write s = s.toLowerCase(). Another ex-
ample is when a developer creates an exception but
forgets to throw it:
try { ... }
catch (IOException e) {
new SAXException(....);
}
FindBugs uses an intraprocedural dataflow anal-
ysis to identify places in which the code could deref-
erence a null pointer.
5,7
Although developers might
need to examine dozens of lines to understand some
defects reported by FindBugs, most can be under-
stood by examining only a few lines of code. One
common case is using the wrong relational or Bool-
ean operation, as in a test to see whether (name != null
|| name.length > 0). Java evaluates the && and || opera-
tors using short-circuit evaluation: the right-hand
side is evaluated only if needed in order to determine
the expressions value. In this case, Java will evalu-
ate the expression name.length only when name is null,
leading to a null pointer exception. The code would
be correct if it had used && rather than ||. FindBugs
also identifies situations in which the code checks
a value for null in some places and unconditionally
dereferences it in others. The following code, for ex-
ample, checks the variable g to see if its null, but if
it is null, the next statement will always deference it,
resulting in a null pointer exception:
if (g != null)
paintScrollBars(g,colors);
g.dispose();
FindBugs also performs an intraprocedural type
analysis that takes into account information from
instanceof tests and finds errors such as checked casts
that always throw a class cast exception. It also
finds places in which two objects guaranteed to be
of unrelated types are compared for equality (for
example, where a StringBuffer is compared to a String,
or the bug Figure 1 shows).
Many other bug patterns exist, some covering
obscure aspects of the Java APIs and languages. A
particular pattern might find only one issue in sev-
eral million lines of code, but collectively these find
a significant number of issues. Examples include
checking whether a double value is equal to Double.
NaN (nothing is equal to Double.NaN, not even Double.NaN)
FindBugs
doesn’t try
to identify
all defects
in a particular
category.
Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

24 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
or performing a bit shift of a 32-bit int value by a
constant value greater than 31.
What FindBugs doesn’t find
FindBugs doesn’t look for or report numerous po-
tential defects that more powerful tools report.
24
We designed it this way for two reasons: to keep
the analysis relatively simple and to avoid generat-
ing too many warnings that dont correspond to
true defects.
One such case is finding null pointer dereferences
that occur only if a particular path through the pro-
gram is executed. Reasoning reported such an issue
in Apache Tomcat 4.1.24.
8
Reasoning warns that if
the body of the first if statement isnt executed but
the body of the second if statement is executed, then
a null pointer exception will occur:
HttpServletResponse hres = null;
if (sres instanceof HttpServletResponse)
hres = (HttpServletResponse) sres;
// Check to see if available
if (!(...).getAvailable()) {
hres.sendError(...)
The problem is that the analysis doesnt know
whether that path is feasible. Perhaps the condition
in the second statement can be true only if the con-
dition in the first statement is true. In some cases,
the conditions might be closely related and some
simple theorem proving can show whether the path
is feasible or infeasible. But showing that a particu-
lar path is feasible can be much harder, and is in
general undecidable.
Rather than worry about whether particular
paths are feasible, FindBugs looks for branches or
statements that, if executed, guarantee that a null
pointer exception will occur. We’ve found that al-
most all null pointer issues we report are either real
bugs or inconsistent code with branches or state-
ments that cant be executed. Code that is merely
inconsistent might not be changed if it’s already
used in production, but generally would be con-
sidered unacceptable in new code if found during
code review.
We also havent pursued checks for array indi-
ces that are out of bounds. Detecting such errors re-
quires tracking relations between various variables
(for instance, is i less than the length of a), and can
become arbitrarily complicated. Some simple tech-
niques might accurately report some obvious bugs,
but we havent yet investigated this.
FindBugs nuts and bolts
FindBugs has a plug-in architecture in which de-
tectors can be defined, each of which might report
several different bug patterns. Rather than use a
pattern language to describe bugs (as PMD
9
and
Metal
10
do), FindBugs detectors are simply writ-
ten in Java using various techniques. Many simple
detectors use a visitor pattern over the class files
or method byte codes. Detectors can access infor-
mation about types, constant values, and special
flags, as well as values stored on the stack or in local
variables.
Detectors can also traverse the control-flow
graph, using the results of data-flow analysis such as
type information, constant values, and nullness. The
data-flow algorithms all generally use information
from conditional tests, so that the analysis results in-
corporate information from instanceof and null tests.
FindBugs doesnt perform interprocedural con-
text-sensitive analysis. However, many detectors
use global information, such as subtype relation-
ships and fields accessed across the entire applica-
tion. A few detectors use interprocedural summary
information, such as which method parameters are
always dereferenced.
FindBugs groups each bug pattern into a cat-
egory (such as correctness, bad practice, perfor-
mance, and internationalization) and assigns each
bug pattern report either high, medium, or low
priority. FindBugs determines priorities via heuris-
tics unique to each detector or pattern that arent
necessarily comparable across bug patterns. In nor-
mal operation, FindBugs doesnt report low-priority
warnings.
The most important aspect of the FindBugs
project is perhaps how we develop new bug detec-
tors: we start with real bugs and develop the sim-
plest possible technique that effectively finds such
bugs. This approach often lets us go from finding a
particular instance of a bug to implementing a de-
Figure 1. The FindBugs
Swing GUI. The
interface shows
FindBugs reviewing
a bug in Suns Java
Development Kit.
Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

September/October 2008 I E E E S O F T W A R E
25
tector that can effectively find instances of it within
hours. Many bugs are quite simple—one bug pat-
tern most recently added to FindBugs occurs when
the code casts an int value to a char and checks the
result to see whether it’s 1. Because the char type
in Java is unsigned, this check will never be true.
A post on http://worsethanfailure.com inspired this
bug detector, and within less than an hour, we had
implemented a detector that found 11 such errors in
Eclipse 3.3M6.
We can run FindBugs from the command line,
using Ant or Maven, within Eclipse or NetBeans,
or in a stand-alone GUI (see Figure 1). We can save
the analysis results in XML, which we can then
further filter, transform, or import into a database.
FindBugs supports two mechanisms that let users
and tools identify corresponding warnings from dif-
ferent analysis runs, even if line numbers and other
program artifacts have changed.
6
This lets tools de-
termine which issues are new and track audits and
human reviews.
FindBugs experiences
We’ve evaluated the issues FindBugs uncovered in
Suns JDK 1.6.0 implementation elsewhere.
11
To
briefly summarize, we looked at each FindBugs
medium- or high-priority correctness warning that
was in one build and not reported in the next,
even though the class containing the warning was
still present. Out of 53 such warning removals, 37
were due to a small targeted program change that
seemed to narrowly focus on remedying the issue
the warning described. Five were program changes
that changed the code such that FindBugs no longer
reported the issue, even though the change didnt
completely address aspects of the underlying issue.
The remaining 11 warnings disappeared owing to
substantial changes or refactorings that had a larger
scope than just removing the one defect.
In previous research, we also manually evalu-
ated all the medium- and high-priority correctness
warnings in build 105 (the official release of Java
1.6.0). We classified the 379 medium- and high-
priority correctness warnings as follows:
5 occurred owing to bad analysis on FindBugs
part (in one case, it didnt understand that a
method call could change a field);
160 were in unreachable code or likely to have
little or no functional impact;
176 seemed to have functional impact; and
38 seemed to have substantial functional im-
pact—that is, the method containing the warn-
ing would clearly behave in a way substantially
at odds with its intended function.
A detailed breakdown of the defect classification
associated with each bug pattern appears in our
previous paper.
11
Clearly, any such classification is
open to interpretation, and other reviewers would
likely produce slightly different classifications. Also,
our assessment of functional impact might differ
from the actual end-user perspective. For example,
even if a method is clearly broken, it might never
be called or be invokable by user code. However,
given many bug patterns localized nature, we
have some confidence in our classifications’ general
soundness.
Experiences at Google
Googles use of FindBugs has evolved over the past
two years in three distinct phases. We used the les-
sons learned during each phase to plan and develop
the next one.
The first phase involved automating FindBugs
to run over all newly checked-in Java source code
and store any generated warnings. A simple Web
interface let developers check projects for possible
bugs and mark false positives. Our initial database
couldnt track warnings over different versions, so
the Web interface saw little use. Developers couldnt
determine which warnings applied to which file ver-
sions or whether the warnings were fresh or stale.
When a defect was fixed, this event wasnt reported
by our process. Such stale warnings have a greater
negative impact on the developers user experience
than a false positive. Successfully injecting Find-
Bugs into Googles development process required
more than just making all warnings available out-
side an engineer’s normal workflow.
In our project’s second phase, we implemented
a service model in which two of the authors (Da-
vid Morgenthaler and John Penix) spent half the
time evaluating warnings and reporting those we
decided were significant defects in Google’s bug-
tracking systems. Over the next six months, we
evaluated several thousand FindBugs warnings
and filed more than 1,000 bug reports. At first,
this effort focused on bug patterns we chose on
the basis of our own opinions about their impor-
tance. As we gained experience and developer
feedback, we prioritized our evaluation on the ba-
sis of our prior empirical results. We ranked the
different patterns using both the observed false-
positive rate and the observed fix rate for issues
we filed as bugs. Thus, we spent more time evalu-
ating warnings that developers were more likely
to x. This ranking scheme carried over into the
third phase, as we noticed that our service model
wouldnt scale well as Google grew.
We observed that, in many cases, filing a bug
Stale
warnings
have a greater
negative
impact on the
developer’s
user
experience
than a false
positive.
Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

26 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
report was more effort than simply fixing the code.
To better scale the operation, we needed to move
the analysis feedback closer to the development
workflow. In the third and current phase, we ex-
ploit Google’s code-review policy and tools. Be-
fore a developer checks code changes into Googles
source-control system, another developer must first
review them. Different tools help support this pro-
cess, including Mondrian, a sophisticated, internal
Web-based review tool.
12
Mondrian lets reviewers add inline comments
to code that are visible to other Mondrian users,
including the original requester. Engineers discuss
the code using these comments and note completed
modifications. For example, a reviewer might re-
quest in an inline comment, “Please rename this
variable.In response, the developer would make
the requested change and reply to the original com-
ment with an inline “Done.” We let Mondrian us-
ers see FindBugs, and other static-analysis warn-
ings, as inline comments from our automated
reviewer, BugBot. We provide a false-positive sup-
pression mechanism and let developers filter the
comments displayed by confidence from high-
est to lowest. Users select the minimum confidence
level they wish to see, which suppresses all lower-
ranked warnings.
This system scales quite well, and weve seen
more than 200 users verify or suppress thousands
of warnings in the past six months. We must still
make some improvements, such as automatically
running FindBugs on each development version
of a file while developers are reviewing it and be-
fore they check it in. The main lesson we learned
from this experience is that developers will pay
attention to, and x, FindBugs warnings if they
appear seamlessly within the workflow. It helps
that code reviewers can also see the warnings and
request fixes as they review the code. Our rank-
ing and false-positive suppression mechanisms are
crucial to keeping the displayed warnings relevant
and valuable so that users dont start ignoring the
more recent, important warnings along with the
older, more trivial ones.
Survey of FindBugs users
Many studies on static-analysis tools focus on their
correctness (are the warnings they identify real
problems?), their completeness (do they find all
problems in a given category?), or their performance
in terms of memory and speed. As organizations
begin integrating these tools into their software
processes, we must consider other aspects of the
interactions between these tools and users or pro-
cesses. Do these tools slow down the process with
unnecessary warnings, or is the value they provide
(in terms of problems found) worth the investment
in time? What’s the best way to integrate these tools
into a given process? Should all developers interact
with the tools, or should quality assurance special-
ists winnow out less useful warnings?
Few rules of thumb exist about the best ways
to use static-analysis tools. Rather different soft-
ware teams use a hodgepodge of methods. Many
users don’t even have a formal process for finding
defects using tools—they run the tools only occa-
sionally and aren’t consistent in how they respond
to warnings. In the end, users might not derive full
value from static-analysis tools, and some might
discontinue their use, incorrectly perceiving that
they lack value.
The FindBugs team has started a project that
aims to identify and evaluate tool features, validate
or invalidate assumptions tool vendors hold, and
guide individuals and teams wanting to use static-
analysis tools effectively. At this early stage, it isnt
clear what the problems are and what questions we
should investigate in more depth. So, we’re con-
ducting some surveys and interviews to get qualita-
tive feedback from FindBugs users. We want to de-
termine who our users are, how they use FindBugs,
how they integrate it into their processes, and what
their perception of its effectiveness is. Beyond sur-
veys and interviews, we hope to spend time observ-
ing users in their work environments to capture the
nuances in their interactions with this tool.
The following sections detail some observations
from the surveys and interviews.
On FindBugs’ utility and impact. The central challenge
for tool creators is to identify warnings that users
are concerned with. Tools such as FindBugs assess
each warning on the basis of its severity (how serious
the problem is in general) and the tools confidence
Table 1
Users that review at least high-priority warnings
for each category (out of 252)
Bug category reviewed Percentage of users
Bad practice 96
Performance 96
Correctness 95
Multithreaded correctness 93
Malicious code vulnerability 86
Dodgy 86
Internationalization 57
Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

Citations
More filters

Learning-based Analysis on the Exploitability of Security Vulnerabilities

Adam Bliss
TL;DR: A tool that uses machine learning techniques to make predictions about whether or not a given vulnerability will be exploited is developed, which could help organizations such as electric utilities to prioritize their security patching operations.
Proceedings ArticleDOI

Two Sparsification Strategies for Accelerating Demand-Driven Pointer Analysis

TL;DR: SparseBoomerang as mentioned in this paper extends Boomerang with alias-aware sparsification, where the resulting sparse control-flow graph consists of statements containing variables that are type compatible with the query variable.
Proceedings ArticleDOI

Inferring Faults in Business Specifications Extracted from Source Code

TL;DR: An approach to reduce the size of the extracted specification that has to be reviewed by 83% is proposed, and it is shown that the technique makes it easier to understand and review a business specification implemented in source code.
Proceedings ArticleDOI

Lightweight verification via specialized typecheckers

TL;DR: In this paper, the authors describe their efforts to make verification more accessible for developers by using specialized pluggable typecheckers to solve complex problems that previously required more complex and harder-to-use verification approaches.
Journal ArticleDOI

Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant

TL;DR: In this paper , the authors present a dataset containing automatically collected source codes solving unique programming exercises of different types, which were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA.
References
More filters
Journal ArticleDOI

A static analyzer for finding dynamic programming errors

TL;DR: A compile‐time analyzer that detects these dynamic errors in large, real‐world programs, and provides valuable contextual information to the programmer who needs to understand and repair the defects.
Proceedings ArticleDOI

Evaluating static analysis defect warnings on production software

TL;DR: FindBugs, a static analysis tool that finds defects in Java programs, is discussed and the kinds of warnings generated and the classification of warnings into false positives, trivial bugs and serious bugs are discussed.
Book

Secure Programming with Static Analysis

Brian Chess, +1 more
TL;DR: The first expert guide to static analysis for software security is as discussed by the authors, which provides a complete guide to how to integrate static analysis into the software development process and how to make the most of it during security code review.
Proceedings ArticleDOI

Finding more null pointer bugs, but not too many

TL;DR: FindBugs now reports 4 of the 9 warnings in Tomcat, shows that one of the warnings reported by Reasoning is a false positive, and classifies the remaining 4 as being dependent on the feasibility of a particular path, which cannot be easier ascertained by a local examination of the source code.
Journal ArticleDOI

Evaluating and tuning a static analysis to find null pointer bugs

TL;DR: It is shown that simple analysis techniques can be used to identify many software defects, both in production code and in student code, and are able to pinpoint 50% to 80% of the defects leading to a null pointer exception at runtime.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What have the authors contributed in "Using static analysis to find bugs" ?

The authors can use many techniques to try to improve quality, including testing, code review, and formal specification. 

Java evaluates the && and || operators using short-circuit evaluation: the right-hand side is evaluated only if needed in order to determine the expression’s value. 

(Eleven percent of users said a team does the review, and 14 percent indicated that a reviewer can make independent decisions only for trivial cases.) 

Other policies include automatically inserting warnings into a bug tracker, having one or two people who maintain FindBugs and review warnings, requiring that warnings are human reviewed within a given time limit or warning-count threshold, integrating FindBugs into code review, running FindBugs automatically overnight and emailing problems to developers, and using a continuous-build server to display active warnings. 

The main lesson the authors learned from this experience is that developers will pay attention to, and fix, FindBugs warnings if they appear seamlessly within the workflow. 

In their project’s second phase, the authors implemented a service model in which two of the authors (David Morgenthaler and John Penix) spent half the time evaluating warnings and reporting those the authors decided were significant defects in Google’s bugtracking systems. 

This makes it likely that many reviews take place closer to the release date, when the pressure means that the emphasis is more on suppressing warnings than fixing code. 

Some users are wary of “tuning code” to FindBugs by modifying the code to remove even low-pri-ority warnings or adding annotations. 

}This code should have been a getter method for the field foundType, but the extra parenthesis means it always recursively calls itself until the stack overflows.