How many percent of users said a team does the review?

(Eleven percent of users said a team does the review, and 14 percent indicated that a reviewer can make independent decisions only for trivial cases.)

What other policies are used to suppress warnings?

Other policies include automatically inserting warnings into a bug tracker, having one or two people who maintain FindBugs and review warnings, requiring that warnings are human reviewed within a given time limit or warning-count threshold, integrating FindBugs into code review, running FindBugs automatically overnight and emailing problems to developers, and using a continuous-build server to display active warnings.

What is the main lesson from this experience?

The main lesson the authors learned from this experience is that developers will pay attention to, and fix, FindBugs warnings if they appear seamlessly within the workflow.

What makes it likely that many reviews take place closer to the release date?

This makes it likely that many reviews take place closer to the release date, when the pressure means that the emphasis is more on suppressing warnings than fixing code.

What is the reason why some users are wary of modifying code to avoid detections?

Some users are wary of “tuning code” to FindBugs by modifying the code to remove even low-pri-ority warnings or adding annotations.

(Open Access) Using Static Analysis to Find Bugs (2008) | Nathaniel Ayewah

Q: What have the authors contributed in "Using static analysis to find bugs" ?

The authors can use many techniques to try to improve quality, including testing, code review, and formal specification.

Q: How does FindBugs evaluate the && and || operators?

Java evaluates the && and || operators using short-circuit evaluation: the right-hand side is evaluated only if needed in order to determine the expression’s value.

Q: What was the first phase of the project?

In their project’s second phase, the authors implemented a service model in which two of the authors (David Morgenthaler and John Penix) spent half the time evaluating warnings and reporting those the authors decided were significant defects in Google’s bugtracking systems.

Q: What is the code that should have been a getter method for the field foundType?

}This code should have been a getter method for the field foundType, but the extra parenthesis means it always recursively calls itself until the stack overflows.

focus

22 I E E E S O F T W A R E P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y 0 7 4 0 -7 4 5 9 / 0 8 / $ 2 5 . 0 0 © 2 0 0 8 I E E E

software development tools

Using Static Analysis

to Find Bugs

Nathaniel Ayewah and William Pugh, University of Maryland

David Hovemeyer, York College of Pennsylvania

J. David Morgenthaler and John Penix, Google

FindBugs, an open

source static-analysis

tool for Java,

evaluates what kinds

of defects can be

effectively detected

with relatively

simple techniques.

oftware quality is important, but often imperfect in practice. We can use many

techniques to try to improve quality, including testing, code review, and for-

mal speciﬁcation. Static-analysis tools evaluate software in the abstract, with-

out running the software or considering a speciﬁc input. Rather than trying

to prove that the code fulﬁlls its speciﬁcation, such tools look for violations of reason-

able or recommended programming practice. Thus, they look for places in which code

might dereference a null pointer or overﬂow an array. Static-analysis tools might also

ﬂag an issue such as a comparison that can’t pos-

sibly be true. Although the comparison won’t cause

a failure or exception, its existence suggests that it

might have resulted from a coding error, leading to

incorrect program behavior.

Some tools also ﬂag or enforce programming

style issues, such as naming conventions or the use

of curly braces in conditionals and looping struc-

tures. The lint program for C programs

is gener-

ally considered the ﬁrst widely used static-analysis

tool for defect detection, although by today’s stan-

dards it’s rather limited. Researchers have done

signiﬁcant work in the area over the past decade,

driven substantially by concerns over defects that

lead to security vulnerabilities, such as buffer over-

ﬂows, format string vulnerabilities, SQL injection,

and cross-site scripting. A vibrant commercial in-

dustry has developed around advanced (and expen-

sive) static-analysis tools,

2,3

and several companies

have their own proprietary in-house tools, such as

Microsoft’s PREﬁx.

Many commercial tools are

sophisticated, using deep analysis techniques. Some

can use or depend on annotations that describe in-

variants and other intended software properties

that tools can’t easily infer, such as the intended re-

lationship between function parameters.

FindBugs is an example of a static-analysis tool

that looks for coding defects.

5–7

The FindBugs

project began as an observation, developed into an

experiment, and snowballed into a widely used tool

with more than half a million downloads world-

wide. The observation that started it all was that

some Java programs contained blatant mistakes

that were detectable with fairly trivial analysis

techniques. Initial experiments showed that even

“production quality” software contained such mis-

takes and that even experienced developers made

them. FindBugs has grown, paying careful atten-

tion to mistakes that occur in practice and to the

techniques and features needed to effectively incor-

porate it into production software development.

Here, we review the types of issues FindBugs

identiﬁes, discuss the techniques it uses, and look

at some experiences using FindBugs on Sun’s Java

Development Kit (JDK) and Google’s Java code

base.

Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

September/October 2008 I E E E S O F T W A R E

FindBugs in practice

In its current form, FindBugs recognizes more than

300 programming mistakes and dubious coding id-

ioms that it can identify using simple analysis tech-

niques. FindBugs also uses more sophisticated anal-

ysis techniques, devised to help effectively identify

certain issues—such as dereferencing of null point-

ers—that occur frequently enough to warrant their

development. Unlike some other tools designed to

provide security guarantees, FindBugs doesn’t try

to identify all defects in a particular category or

prove that software doesn’t contain a particular de-

fect. Rather, it’s designed to effectively identify low-

hanging fruit—to cheaply detect defects we believe

developers will want to review and remedy.

Many developers use FindBugs ad hoc, and a

growing number of projects and companies are

integrating it into their standard build and testing

systems. Google has incorporated FindBugs into its

standard testing and code-review process and has

ﬁxed more than 1,000 issues in its internal code

base that FindBugs has identiﬁed.

Defects in real code

To appreciate static analysis for defect detection

in general, and FindBugs in particular, it helps to

be familiar with some sample defects found in real

code. Let’s look at some examples from Sun’s JDK

1.6.0 implementation, which also are representative

of code seen elsewhere.

One unexpectedly common defect is the inﬁ-

nite recursive loop—that is, a function that always

returns the result of invoking itself. We originally

extended FindBugs to look for this defect because

some freshman at the University of Maryland

had trouble understanding how Java constructors

worked. When we ran it against build 13 of Sun’s

JDK 1.6, we found ﬁve inﬁnite recursive loops,

including

public String foundType() {

return this.foundType();

}

This code should have been a getter method for

the ﬁeld foundType, but the extra parenthesis means

it always recursively calls itself until the stack over-

ﬂows. Various mistakes lead to inﬁnite recursive

loops, but the same simple techniques can detect

them all. Google has found and ﬁxed more than 70

inﬁnite recursive loops in their code base, and they

occur surprisingly frequently in other code bases

we’ve examined.

Another common bug pattern is when software

invokes a method but ignores its return value, de-

spite the fact that doing so makes no sense. An ex-

ample is the statement s.toLowerCase(), where s is a String.

Because Strings in Java are immutable, the toLowerCase()

method has no effect on the String it’s invoked on,

but rather returns a new String. The developer prob-

ably intended to write s = s.toLowerCase(). Another ex-

ample is when a developer creates an exception but

forgets to throw it:

try { ... }

catch (IOException e) {

new SAXException(....);

}

FindBugs uses an intraprocedural dataﬂow anal-

ysis to identify places in which the code could deref-

erence a null pointer.

5,7

Although developers might

need to examine dozens of lines to understand some

defects reported by FindBugs, most can be under-

stood by examining only a few lines of code. One

common case is using the wrong relational or Bool-

ean operation, as in a test to see whether (name != null

|| name.length > 0). Java evaluates the && and || opera-

tors using short-circuit evaluation: the right-hand

side is evaluated only if needed in order to determine

the expression’s value. In this case, Java will evalu-

ate the expression name.length only when name is null,

leading to a null pointer exception. The code would

be correct if it had used && rather than ||. FindBugs

also identiﬁes situations in which the code checks

a value for null in some places and unconditionally

dereferences it in others. The following code, for ex-

ample, checks the variable g to see if it’s null, but if

it is null, the next statement will always deference it,

resulting in a null pointer exception:

if (g != null)

paintScrollBars(g,colors);

g.dispose();

FindBugs also performs an intraprocedural type

analysis that takes into account information from

instanceof tests and ﬁnds errors such as checked casts

that always throw a class cast exception. It also

ﬁnds places in which two objects guaranteed to be

of unrelated types are compared for equality (for

example, where a StringBuffer is compared to a String,

or the bug Figure 1 shows).

Many other bug patterns exist, some covering

obscure aspects of the Java APIs and languages. A

particular pattern might ﬁnd only one issue in sev-

eral million lines of code, but collectively these ﬁnd

a signiﬁcant number of issues. Examples include

checking whether a double value is equal to Double.

NaN (nothing is equal to Double.NaN, not even Double.NaN)

FindBugs

doesn’t try

to identify

all defects

in a particular

category.

Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

24 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e

or performing a bit shift of a 32-bit int value by a

constant value greater than 31.

What FindBugs doesn’t ﬁnd

FindBugs doesn’t look for or report numerous po-

tential defects that more powerful tools report.

2–4

We designed it this way for two reasons: to keep

the analysis relatively simple and to avoid generat-

ing too many warnings that don’t correspond to

true defects.

One such case is ﬁnding null pointer dereferences

that occur only if a particular path through the pro-

gram is executed. Reasoning reported such an issue

in Apache Tomcat 4.1.24.

Reasoning warns that if

the body of the ﬁrst if statement isn’t executed but

the body of the second if statement is executed, then

a null pointer exception will occur:

HttpServletResponse hres = null;

if (sres instanceof HttpServletResponse)

hres = (HttpServletResponse) sres;

// Check to see if available

if (!(...).getAvailable()) {

hres.sendError(...)

The problem is that the analysis doesn’t know

whether that path is feasible. Perhaps the condition

in the second statement can be true only if the con-

dition in the ﬁrst statement is true. In some cases,

the conditions might be closely related and some

simple theorem proving can show whether the path

is feasible or infeasible. But showing that a particu-

lar path is feasible can be much harder, and is in

general undecidable.

Rather than worry about whether particular

paths are feasible, FindBugs looks for branches or

statements that, if executed, guarantee that a null

pointer exception will occur. We’ve found that al-

most all null pointer issues we report are either real

bugs or inconsistent code with branches or state-

ments that can’t be executed. Code that is merely

inconsistent might not be changed if it’s already

used in production, but generally would be con-

sidered unacceptable in new code if found during

code review.

We also haven’t pursued checks for array indi-

ces that are out of bounds. Detecting such errors re-

quires tracking relations between various variables

(for instance, is i less than the length of a), and can

become arbitrarily complicated. Some simple tech-

niques might accurately report some obvious bugs,

but we haven’t yet investigated this.

FindBugs nuts and bolts

FindBugs has a plug-in architecture in which de-

tectors can be deﬁned, each of which might report

several different bug patterns. Rather than use a

pattern language to describe bugs (as PMD

and

Metal

do), FindBugs detectors are simply writ-

ten in Java using various techniques. Many simple

detectors use a visitor pattern over the class ﬁles

or method byte codes. Detectors can access infor-

mation about types, constant values, and special

ﬂags, as well as values stored on the stack or in local

variables.

Detectors can also traverse the control-ﬂow

graph, using the results of data-ﬂow analysis such as

type information, constant values, and nullness. The

data-ﬂow algorithms all generally use information

from conditional tests, so that the analysis results in-

corporate information from instanceof and null tests.

FindBugs doesn’t perform interprocedural con-

text-sensitive analysis. However, many detectors

use global information, such as subtype relation-

ships and ﬁelds accessed across the entire applica-

tion. A few detectors use interprocedural summary

information, such as which method parameters are

always dereferenced.

FindBugs groups each bug pattern into a cat-

egory (such as correctness, bad practice, perfor-

mance, and internationalization) and assigns each

bug pattern report either high, medium, or low

priority. FindBugs determines priorities via heuris-

tics unique to each detector or pattern that aren’t

necessarily comparable across bug patterns. In nor-

mal operation, FindBugs doesn’t report low-priority

warnings.

The most important aspect of the FindBugs

project is perhaps how we develop new bug detec-

tors: we start with real bugs and develop the sim-

plest possible technique that effectively ﬁnds such

bugs. This approach often lets us go from ﬁnding a

particular instance of a bug to implementing a de-

Figure 1. The FindBugs

Swing GUI. The

interface shows

FindBugs reviewing

a bug in Sun’s Java

Development Kit.

Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

September/October 2008 I E E E S O F T W A R E

tector that can effectively ﬁnd instances of it within

hours. Many bugs are quite simple—one bug pat-

tern most recently added to FindBugs occurs when

the code casts an int value to a char and checks the

result to see whether it’s –1. Because the char type

in Java is unsigned, this check will never be true.

A post on http://worsethanfailure.com inspired this

bug detector, and within less than an hour, we had

implemented a detector that found 11 such errors in

Eclipse 3.3M6.

We can run FindBugs from the command line,

using Ant or Maven, within Eclipse or NetBeans,

or in a stand-alone GUI (see Figure 1). We can save

the analysis results in XML, which we can then

further ﬁlter, transform, or import into a database.

FindBugs supports two mechanisms that let users

and tools identify corresponding warnings from dif-

ferent analysis runs, even if line numbers and other

program artifacts have changed.

This lets tools de-

termine which issues are new and track audits and

human reviews.

FindBugs experiences

We’ve evaluated the issues FindBugs uncovered in

Sun’s JDK 1.6.0 implementation elsewhere.

brieﬂy summarize, we looked at each FindBugs

medium- or high-priority correctness warning that

was in one build and not reported in the next,

even though the class containing the warning was

still present. Out of 53 such warning removals, 37

were due to a small targeted program change that

seemed to narrowly focus on remedying the issue

the warning described. Five were program changes

that changed the code such that FindBugs no longer

reported the issue, even though the change didn’t

completely address aspects of the underlying issue.

The remaining 11 warnings disappeared owing to

substantial changes or refactorings that had a larger

scope than just removing the one defect.

In previous research, we also manually evalu-

ated all the medium- and high-priority correctness

warnings in build 105 (the ofﬁcial release of Java

1.6.0). We classiﬁed the 379 medium- and high-

priority correctness warnings as follows:

5 occurred owing to bad analysis on FindBugs’

part (in one case, it didn’t understand that a

method call could change a ﬁeld);

160 were in unreachable code or likely to have

little or no functional impact;

176 seemed to have functional impact; and

38 seemed to have substantial functional im-

pact—that is, the method containing the warn-

ing would clearly behave in a way substantially

at odds with its intended function.

■

A detailed breakdown of the defect classiﬁcation

associated with each bug pattern appears in our

previous paper.

Clearly, any such classiﬁcation is

open to interpretation, and other reviewers would

likely produce slightly different classiﬁcations. Also,

our assessment of functional impact might differ

from the actual end-user perspective. For example,

even if a method is clearly broken, it might never

be called or be invokable by user code. However,

given many bug patterns’ localized nature, we

have some conﬁdence in our classiﬁcations’ general

soundness.

Experiences at Google

Google’s use of FindBugs has evolved over the past

two years in three distinct phases. We used the les-

sons learned during each phase to plan and develop

the next one.

The ﬁrst phase involved automating FindBugs

to run over all newly checked-in Java source code

and store any generated warnings. A simple Web

interface let developers check projects for possible

bugs and mark false positives. Our initial database

couldn’t track warnings over different versions, so

the Web interface saw little use. Developers couldn’t

determine which warnings applied to which ﬁle ver-

sions or whether the warnings were fresh or stale.

When a defect was ﬁxed, this event wasn’t reported

by our process. Such stale warnings have a greater

negative impact on the developer’s user experience

than a false positive. Successfully injecting Find-

Bugs into Google’s development process required

more than just making all warnings available out-

side an engineer’s normal workﬂow.

In our project’s second phase, we implemented

a service model in which two of the authors (Da-

vid Morgenthaler and John Penix) spent half the

time evaluating warnings and reporting those we

decided were signiﬁcant defects in Google’s bug-

tracking systems. Over the next six months, we

evaluated several thousand FindBugs warnings

and ﬁled more than 1,000 bug reports. At ﬁrst,

this effort focused on bug patterns we chose on

the basis of our own opinions about their impor-

tance. As we gained experience and developer

feedback, we prioritized our evaluation on the ba-

sis of our prior empirical results. We ranked the

different patterns using both the observed false-

positive rate and the observed ﬁx rate for issues

we ﬁled as bugs. Thus, we spent more time evalu-

ating warnings that developers were more likely

to ﬁx. This ranking scheme carried over into the

third phase, as we noticed that our service model

wouldn’t scale well as Google grew.

We observed that, in many cases, ﬁling a bug

Stale

warnings

have a greater

negative

impact on the

developer’s

user

experience

than a false

positive.

Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

26 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e

report was more effort than simply ﬁxing the code.

To better scale the operation, we needed to move

the analysis feedback closer to the development

workﬂow. In the third and current phase, we ex-

ploit Google’s code-review policy and tools. Be-

fore a developer checks code changes into Google’s

source-control system, another developer must ﬁrst

review them. Different tools help support this pro-

cess, including Mondrian, a sophisticated, internal

Web-based review tool.

Mondrian lets reviewers add inline comments

to code that are visible to other Mondrian users,

including the original requester. Engineers discuss

the code using these comments and note completed

modiﬁcations. For example, a reviewer might re-

quest in an inline comment, “Please rename this

variable.” In response, the developer would make

the requested change and reply to the original com-

ment with an inline “Done.” We let Mondrian us-

ers see FindBugs, and other static-analysis warn-

ings, as inline comments from our automated

reviewer, BugBot. We provide a false-positive sup-

pression mechanism and let developers ﬁlter the

comments displayed by “conﬁdence” from high-

est to lowest. Users select the minimum conﬁdence

level they wish to see, which suppresses all lower-

ranked warnings.

This system scales quite well, and we’ve seen

more than 200 users verify or suppress thousands

of warnings in the past six months. We must still

make some improvements, such as automatically

running FindBugs on each development version

of a ﬁle while developers are reviewing it and be-

fore they check it in. The main lesson we learned

from this experience is that developers will pay

attention to, and ﬁx, FindBugs warnings if they

appear seamlessly within the workﬂow. It helps

that code reviewers can also see the warnings and

request ﬁxes as they review the code. Our rank-

ing and false-positive suppression mechanisms are

crucial to keeping the displayed warnings relevant

and valuable so that users don’t start ignoring the

more recent, important warnings along with the

older, more trivial ones.

Survey of FindBugs users

Many studies on static-analysis tools focus on their

correctness (are the warnings they identify real

problems?), their completeness (do they ﬁnd all

problems in a given category?), or their performance

in terms of memory and speed. As organizations

begin integrating these tools into their software

processes, we must consider other aspects of the

interactions between these tools and users or pro-

cesses. Do these tools slow down the process with

unnecessary warnings, or is the value they provide

(in terms of problems found) worth the investment

in time? What’s the best way to integrate these tools

into a given process? Should all developers interact

with the tools, or should quality assurance special-

ists winnow out less useful warnings?

Few rules of thumb exist about the best ways

to use static-analysis tools. Rather different soft-

ware teams use a hodgepodge of methods. Many

users don’t even have a formal process for ﬁnding

defects using tools—they run the tools only occa-

sionally and aren’t consistent in how they respond

to warnings. In the end, users might not derive full

value from static-analysis tools, and some might

discontinue their use, incorrectly perceiving that

they lack value.

The FindBugs team has started a project that

aims to identify and evaluate tool features, validate

or invalidate assumptions tool vendors hold, and

guide individuals and teams wanting to use static-

analysis tools effectively. At this early stage, it isn’t

clear what the problems are and what questions we

should investigate in more depth. So, we’re con-

ducting some surveys and interviews to get qualita-

tive feedback from FindBugs users. We want to de-

termine who our users are, how they use FindBugs,

how they integrate it into their processes, and what

their perception of its effectiveness is. Beyond sur-

veys and interviews, we hope to spend time observ-

ing users in their work environments to capture the

nuances in their interactions with this tool.

The following sections detail some observations

from the surveys and interviews.

On FindBugs’ utility and impact. The central challenge

for tool creators is to identify warnings that users

are concerned with. Tools such as FindBugs assess

each warning on the basis of its severity (how serious

the problem is in general) and the tool’s conﬁdence

Table 1

Users that review at least high-priority warnings

for each category (out of 252)

Bug category reviewed Percentage of users

Bad practice 96

Performance 96

Correctness 95

Multithreaded correctness 93

Malicious code vulnerability 86

Dodgy 86

Internationalization 57

Authorized licensed use limited to: University of Maryland College Park. Downloaded on January 15, 2010 at 13:21 from IEEE Xplore. Restrictions apply.

Using Static Analysis to Find Bugs

Citations

Learning-based Analysis on the Exploitability of Security Vulnerabilities

Two Sparsification Strategies for Accelerating Demand-Driven Pointer Analysis

Inferring Faults in Business Specifications Extracted from Source Code

Lightweight verification via specialized typecheckers

Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant

References

A static analyzer for finding dynamic programming errors

Evaluating static analysis defect warnings on production software

Secure Programming with Static Analysis

Finding more null pointer bugs, but not too many

Evaluating and tuning a static analysis to find null pointer bugs

Related Papers (5)

A few billion lines of code later: using static analysis to find bugs in the real world

Finding bugs is easy

Why don't software developers use static analysis tools to find bugs?

On the value of static analysis for fault detection in software

A comparison of bug finding tools for Java

Frequently Asked Questions (9)

Q1. What have the authors contributed in "Using static analysis to find bugs" ?

Q2. How does FindBugs evaluate the && and || operators?

Q3. How many percent of users said a team does the review?

Q4. What other policies are used to suppress warnings?

Q5. What is the main lesson from this experience?

Q6. What was the first phase of the project?

Q7. What makes it likely that many reviews take place closer to the release date?

Q8. What is the reason why some users are wary of modifying code to avoid detections?

Q9. What is the code that should have been a getter method for the field foundType?