scispace - formally typeset
Open AccessProceedings Article

An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries

TLDR
This work studies the accuracy of nine state-of-the-art disassemblers on 981 real-world compiler-generated binaries with a wide variety of properties, and reveals a mismatch between expectations in the literature, and the actual capabilities of modern disassembler.
Abstract
It is well-known that static disassembly is an unsolved problem, but how much of a problem is it in real software—for instance, for binary protection schemes? This work studies the accuracy of nine state-of-the-art disassemblers on 981 real-world compiler-generated binaries with a wide variety of properties. In contrast, prior work focuses on isolated corner cases; we show that this has led to a widespread and overly pessimistic view on the prevalence of complex constructs like inline data and overlapping code, leading reviewers and researchers to underestimate the potential of binary-based research. On the other hand, some constructs, such as function boundaries, are much harder to recover accurately than is reflected in the literature, which rarely discusses much needed error handling for these primitives. We study 30 papers recently published in six major security venues, and reveal a mismatch between expectations in the literature, and the actual capabilities of modern disassemblers. Our findings help improve future research by eliminating this mismatch.

read more

Content maybe subject to copyright    Report

This paper is included in the Proceedings of the
25th USENIX Security Symposium
August 1012, 2016 • Austin, TX
ISBN 978-1-931971-32-4
Open access to the Proceedings of the
25th USENIX Security Symposium
is sponsored by USENIX
An In-Depth Analysis of Disassembly
on Full-Scale x86/x64 Binaries
Dennis Andriesse, Xi Chen, and Victor van der Veen, Vrije Universiteit Amsterdam;
Asia Slowinska, Lastline, Inc.; Herbert Bos, Vrije Universiteit Amsterdam
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/andriesse

USENIX Association 25th USENIX Security Symposium 583
An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries
Dennis Andriesse
†§
, Xi Chen
†§
, Victor van der Veen
†§
, Asia Slowinska
, and Herbert Bos
†§
{d.a.andriesse,x.chen,v.vander.veen,h.j.bos}@vu.nl
Computer Science Institute, Vrije Universiteit Amsterdam
§
Amsterdam Department of Informatics
asia@lastline.com
Lastline, Inc.
Abstract
It is well-known that static disassembly is an unsolved
problem, but how much of a problem is it in real software—
for instance, for binary protection schemes? This work
studies the accuracy of nine state-of-the-art disassemblers
on 981 real-world compiler-generated binaries with a
wide variety of properties. In contrast, prior work focuses
on isolated corner cases; we show that this has led to a
widespread and overly pessimistic view on the prevalence
of complex constructs like inline data and overlapping
code, leading reviewers and researchers to underestimate
the potential of binary-based research. On the other hand,
some constructs, such as function boundaries, are much
harder to recover accurately than is reflected in the litera-
ture, which rarely discusses much needed error handling
for these primitives. We study 30 papers recently pub-
lished in six major security venues, and reveal a mismatch
between expectations in the literature, and the actual ca-
pabilities of modern disassemblers. Our findings help
improve future research by eliminating this mismatch.
1 Introduction
The capabilities and limitations of disassembly are not
always clearly defined or understood, making it difficult
for researchers and reviewers to judge the practical fea-
sibility of techniques based on it. At the same time, dis-
assembly is the backbone of research in static binary
instrumentation [5, 19, 32], binary code lifting to LLVM
IR (for reoptimization or analysis) [
38
], binary-level vul-
nerability search [
27
], and binary-level anti-exploitation
systems [
1
,
8
,
29
,
46
]. Disassembly is thus crucial for
analyzing or securing untrusted or proprietary binaries,
where source code is simply not available.
The accuracy of disassembly strongly depends on the
type of binary under analysis. In the most general case,
the disassembler can make very few assumptions on the
structure of a binary—high-level concepts like functions
and loops have no real significance at the binary level [
3
].
Moreover, the binary may contain complex constructs,
such as overlapping or self-modifying code, or inline
data in executable regions. This is especially true for ob-
fuscated binaries, making disassembly of such binaries
extremely challenging. Disassembly in general is unde-
cidable [
43
]. On the other hand, one might expect that
compilers emit code with more predictable properties,
containing a limited set of patterns that the disassembler
may try to identify.
Whether this is true is not well recognized, leading
to a wide range of views on disassembly. These vary
from the stance that disassembly of benign binaries is
a solved problem [
48
], to the stance that complex cases
are rampant [
23
]. It is unclear which view is justified in
a given situation. The aim of our work is thus to study
binary disassembly in a realistic setting, and more clearly
delineate the capabilities of modern disassemblers.
It is clear from prior work that obfuscated code may
complicate disassembly in a myriad of ways [
18
,
21
].
We therefore limit our study to non-obfuscated binaries
compiled on modern x86 and x64 platforms (the most
common in binary analysis and security research). Specif-
ically, we focus on binaries generated with the popular
gcc
,
clang
and Visual Studio compilers. We explore a
wide variety of 981 realistic binaries, including stripped,
optimized, statically linked, and link-time optimized bi-
naries, as well as library code that includes handcrafted
assembly. We disassemble these binaries using nine state-
of-the-art research and industry disassemblers, studying
their ability to recover all disassembly primitives com-
monly used in the literature: instructions, function start ad-
dresses, function signatures, Control Flow Graphs (CFG)
and callgraphs. In contrast, prior studies focus strongly
on complex corner cases in isolation [
23
,
25
]. Our results
show that such cases are exceedingly rare, even in opti-
mized code, and that focusing on them leads to an overly
pessimistic view on disassembly.
We show that many disassembly primitives can be re-
covered with better accuracy than previously thought. For
1

584 25th USENIX Security Symposium USENIX Association
instance, instruction accuracy often approaches 100%,
even using linear disassembly. On the other hand, we
also identify some primitives which are more difficult to
recover—most notably, function start information.
To facilitate a better match between the capabilities of
disassemblers and the expectations in the literature, we
comprehensively study all binary-based papers published
in six major security conferences in the last three years.
Ironically, this study shows a focus in the literature on
rare complex constructs, while little attention is devoted
to error handling for primitives that really are prone to in-
accuracies. For instance, only 25% of Windows-targeted
papers that rely on function information discuss potential
inaccuracies, even though the accuracy of function detec-
tion regularly drops to 80% or less. Moreover, less than
half of all papers implement mechanisms to deal with
inaccuracies, even though in most cases errors can lead to
malignant failures like crashes.
Contributions & Outline
The contributions of our work are threefold.
(1)
We study disassembly on 981 full-scale compiler-
generated binaries, to clearly define the true capa-
bilities of modern disassemblers (Section 3) and the
implications on binary-based research (Section 4).
(2)
Our results allow researchers and reviewers to ac-
curately judge future binary-based research—a task
currently complicated by the myriad of differing opin-
ions on the subject. To this end, we release all our raw
results and ground truth for use in future evaluations
of binary-based research
1
.
(3)
We analyze the quality of all recent binary-based
work published in six major security venues by com-
paring our results to the requirements and assump-
tions of this work (Section 5). This shows where
disassembler capabilities and the literature are mis-
matched, and how this mismatch can be resolved
moving forward (Section 6).
2 Evaluating Real-World Disassembly
This section outlines our disassembly evaluation approach.
We discuss our results, and the implications on binary-
based research, in Sections 3–4. Sections 5–6 discuss how
closely expectations in the literature match our results.
2.1 Binary Test Suite
We focus our analysis on non-obfuscated x86 and x64 bi-
naries generated with modern compilers. Our experiments
are based on Linux (ELF) and Windows (PE) binaries,
generated with the popular
gcc
v5.1.1,
clang
v3.7.0 and
1
https://www.vusec.net/projects/disassembly/
Visual Studio 2015 compilers—the most recent versions
at the time of writing. The x86/x64 instruction set is
the most common target in binary-based research. More-
over, x86/x64 is a variable-length instruction set, allowing
unique constructs such as overlapping and “misaligned”
instructions which can be difficult to disassemble. We
exclude obfuscated binaries, as there is no doubt that they
can wreak havoc on disassembler performance and we
hardly need confirm this in our experiments.
We base our disassembly experiments on a test suite
composed of the SPEC CPU2006 C and C++ benchmarks,
the widely used and highly optimized
glibc-2.22
li-
brary, and a set of popular server applications consisting
of
nginx
v1.8.0,
lighttpd
v1.4.39,
opensshd
v7.1p2,
vsftpd
v3.0.3 and
exim
v4.86. This test suite has several
properties which make it representative: (1) It contains a
wide variety of realistic C and C++ binaries, ranging from
very small to large; (2) These correspond to binaries used
in evaluations of other work, making it easier to relate
our results to the literature; (3) The tests include highly
optimized library code, containing handwritten assembly
and complex corner cases which regular applications do
not; (4) SPEC CPU2006 compiles on both Linux and
Windows, allowing a fair comparison of results between
gcc, clang, and Visual Studio.
To study the impact of compiler options on disassembly,
we compile the SPEC CPU2006 part of our test suite
multiple times with a variety of popular configurations.
Specifically: (1) Optimization levels
O0
,
O1
,
O2
and
O3
for
gcc
,
clang
and Visual Studio; (2) Optimization for
size (
Os
) on
gcc
and
clang
; (3) Static linking and link-
time optimization (
-flto
) on 64-bit
gcc
; (4) Stripped
binaries, as well as binaries with symbols. We compile the
servers for both x86 and x64 with
gcc
and
clang
, leaving
all remaining settings at the Makefile defaults. Finally,
we compile
glibc-2.22
with 64-bit
gcc
, to which it is
specifically tailored. In total, our test suite contains 981
binaries and shared objects.
2.2 Disassembly Primitives
We test all five common disassembly primitives used in
the literature (see Section 5). Some of these go well
beyond basic instruction recovery, and are only supported
by a subset of the disassemblers we test.
(1) Instructions: The pure assembly-level instructions.
(2) Function starts: Start addresses of the functions
originally defined in the source code.
(3) Function signatures: Parameter lists for functions
found by the disassembler.
(4) Control Flow Graph (CFG) accuracy: The sound-
ness and completeness of the CFG digraphs
G
cfg
=
(V
bb
, E
cf
), which describe how control flow edges E
cf
V
bb
×V
bb
connect the basic blocks
V
bb
. In practice, dis-
2

USENIX Association 25th USENIX Security Symposium 585
assemblers deviate from the traditional CFG; typically
by omitting indirect edges, and sometimes by defining
a global CFG rather than per-function CFGs. Therefore,
we define the Interprocedural CFG (ICFG): the union of
all function-level CFGs, connected through interprocedu-
ral call and jump edges. This allows us to abstract from
the disassemblers’ varying CFG definitions, by focusing
our measurement on the coverage of basic blocks in the
ICFG. We pay special attention to hard-to-resolve basic
blocks, such as the heads of address-taken functions and
switch/case blocks reached via jump tables.
(5) Callgraph accuracy: The correctness of the digraph
G =(V
cs
V
f
, E
call
)
linking the set
V
cs
of call sites to
the function starts
V
f
through call edges
E
call
V
cs
×V
f
.
Similarly to the CFG, disassemblers deviate from the
traditional callgraph definition by including only direct
call edges. In our experiments, we therefore measure the
completeness of this direct callgraph, considering indirect
calls and tailcalls separately in our complex case analysis.
2.3 Complex Constructs
We also study the prevalence in real-world binaries of
complex corner cases which are often cited as particularly
harmful to disassembly [5, 23, 34].
(1) Overlapping/shared basic blocks: Basic blocks may
be shared between different functions, hindering disas-
semblers from properly separating these functions.
(2) Overlapping instructions: Since x86/x64 uses
variable-length instructions without any enforced memory
alignment, jumps can target any offset within a multi-byte
instruction. This allows the same code bytes to be in-
terpreted as multiple overlapping instructions, some of
which may be missed by disassemblers.
(3) Inline data and jump tables: Data bytes may be
mixed in with instructions in a code section. Examples of
potential inline data include jump tables or local constants.
Such data can cause false positive instructions, and can
desynchronize the instruction stream if the last few data
bytes are mistakenly interpreted as the start of a multi-
byte instruction. Disassembly then continues parsing this
instruction into the actual code bytes, losing track of the
instruction stream alignment.
(4) Switches/case blocks: Switches are a challenge for
basic block discovery, because the switch case blocks are
typically indirect jump targets (encoded in jump tables).
(5) Alignment bytes: Some code (i.e.,
nop
) or data
bytes may have no semantic meaning, serving only to
align other code for optimization of memory accesses.
Alignment bytes may cause desynchronization if they do
not encode valid instructions.
(6) Multi-entry functions: Functions may have multiple
basic blocks used as entry points, which can complicate
function start recognition.
<BB
0
>
cmp ecx, edx
jl <BB
2
>
jmp <BB
1
>
<BB
1
>
mov eax,[fptr+ecx]
call eax
<BB
2
>
mov eax,[fptr+edx]
call eax
<f
1
>
<f
2
>
<f
0
>
<inline data>
<BB
0
>
cmp ecx, edx
jl <BB
2
>
jmp <BB
1
>
<BB
1
>
mov eax,[fptr+ecx]
call eax
<BB
2
>
mov eax,[fptr+edx]
call eax
<f
1
>
<f
2
>
<f
0
>
<inline data>
Recursive Linear
Figure 1: Disassembly methods. Arrows show disassem-
bly flow. Gray blocks show missed or corrupted code.
(7) Tail calls: In this common optimization, a function
ends not with a return, but with a jump to another function.
This makes it more difficult for disassemblers to detect
where the optimized function ends.
2.4 Disassembly & Testing Environment
We conducted all disassembly experiments on an Intel
Core i5 4300U machine with 8GB of RAM, running
Ubuntu 15.04 with kernel 3.19.0-47. We compiled our
gcc
and
clang
test cases on this same machine. The
Visual Studio binaries were compiled on an Intel Core i7
3770 machine with 8GB of RAM, running Windows 10.
We tested nine popular industry and research dis-
assemblers: IDA Pro v6.7, Hopper v3.11.5, Dyninst
v9.1.0 [
5
], BAP v0.9.9 [
7
], ByteWeight v0.9.9 [
4
], Jakstab
v0.8.4 [
17
], angr v4.6.1.4 [
36
], PSI v1.1 [
47
] (the suc-
cessor of BinCFI [
48
]), and objdump v2.22. ByteWeight
yields only function starts, while Dyninst and PSI sup-
port only ELF binaries (for Dyninst, this is due to our
Linux testing environment). Jakstab supports only x86
PE binaries. We omit angr results for x86, as angr is opti-
mized for x64. PSI is based on objdump, with added error
correction. Section 3 shows that PSI (and all linear dis-
assemblers) perform equivalently to objdump; therefore,
we group these under the name linear disassembly.
All others are recursive descent disassemblers, illus-
trated in Figure 1. These follow control flow to avoid
desynchronization by inline data, and to discover com-
plex cases like overlapping instructions. In contrast, linear
disassemblers like objdump simply decode all code bytes
consecutively, and may be confused by inline data, possi-
bly causing garbled code like
BB
1
in the figure. Recursive
disassemblers avoid this problem, but may miss indirect
control flow targets, such as f
1
and f
2
in the figure.
3

586 25th USENIX Security Symposium USENIX Association
2.5 Ground Truth
Our disassembly experiments require precise ground truth
on instructions, basic blocks and function starts, call sites,
function signatures and switch/case addresses. This in-
formation is normally only available at the source level.
Clearly, we cannot obtain our ground truth from any dis-
assembler, as this would bias our experiments.
We base our ELF ground truth on information collected
by an LLVM analysis pass, and on DWARF v3 debug-
ging information. Specifically, we use LLVM to collect
source-level information, such as the source lines belong-
ing to functions and switch statements. We then compile
our test binaries with DWARF information, and link the
source-level line numbers to the binary-level addresses us-
ing the DWARF line number table. We also use DWARF
information on function parameters for our function sig-
nature analysis. We strip the DWARF information from
the binaries before our disassembly experiments.
The line number table provides a full mapping of source
lines to binary, but not all instructions correspond directly
to a source line. To find these instructions, we use Cap-
stone v3.0.4 to start a conservative linear disassembly
sweep from each known instruction address, stopping
at control flow instructions unless we can guarantee the
validity of their destination and fall-through addresses.
For instance, the target of a direct unconditional jump
instruction can be guaranteed, while its fall-through block
cannot (as it might contain inline data).
This approach yields ground truth for over 98% of
code bytes in the tested binaries. We manually analyze
the remaining bytes, which are typically alignment code
unreachable by control flow. The result is a ground truth
file for each binary test case, that specifies the type of
each code byte, as well as instruction and function starts,
switch/case addresses, and function signatures.
We use a similar method for the Windows PE tests,
but based on information from PDB (Program Database)
files produced by Visual Studio instead of DWARF. This
produces files analogous to our ELF ground truth format.
We release all our ground truth files and our test suite,
to aid in future evaluations of binary-based research and
disassembly.
3 Disassembly Results
This section describes the results of our disassembly ex-
periments, using the methodology outlined in Section 2.
We first discuss application binaries (SPEC and servers),
followed by a separate discussion on highly optimized
libraries. Finally, we discuss the impact of static linking
and link-time optimization. We release all our raw results,
and present aggregated results here for space reasons.
3.1 Application Binaries
This section presents disassembly results for application
code. We discuss accuracy results for all primitives, and
also analyze the prevalence of complex cases.
3.1.1 SPEC CPU2006 Results
Figures 2a–2e show the accuracy for the SPEC CPU2006
C and C++ benchmarks of the recovered instructions,
function starts, function signatures, CFGs and callgraphs,
respectively. We show the percentage of correctly recov-
ered (true positive) primitives for each tested compiler
at optimization levels
O0
O3
. Note that the legend in
Figure 2a applies to Figures 2a–2e. All lines are geo-
metric mean results (simply referred to as “mean” from
this point); arithmetic means and standard deviations are
discussed in the text where they differ significantly. We
show separate results for the C and C++ benchmarks, to
expose variations in disassembly accuracy that may result
from different code patterns.
Some disassemblers support only a subset of the tested
primitives. For instance, linear disassembly provides only
instructions, and IDA Pro is the only tested disassembler
that provides function signatures. Moreover, some disas-
semblers only support a subset of the tested binary types,
and are therefore only shown in the plots where they are
applicable. For clarity, the graphs only show results for
stripped binaries; our tests with standard symbols (not
DWARF information) are discussed in the text.
3.1.1.1 Instruction boundaries
Figure 2a shows the percentage of correctly recovered
instructions. Interestingly, linear disassembly consistently
outperforms all other disassemblers, finding 100% of the
instructions for
gcc
and
clang
binaries (without false
positives), and 99.92% in the worst case for Visual Studio.
Linear disassembly.
The perfect accuracy for linear
disassembly with
gcc
and
clang
owes to the fact that
these compilers never produce inline data, not even for
jump tables. Instead, jump tables and other data are placed
in the .rodata section.
Visual Studio does produce inline data, typically jump
tables. This leads to some false positives with linear disas-
sembly (data treated as code), amounting to a worst-case
mean of 989 false positive instructions (0.56% of the dis-
assembled code) for the x86 C++ tests at
O3
. The number
of missed instructions (false negatives, due to desynchro-
nization) is much lower, at a worst-case mean of 0.09%.
This is because x86/x64 disassembly automatically resyn-
chronizes within two or three instructions [21].
4

Citations
More filters
Proceedings ArticleDOI

Ramblr: Making Reassembly Great Again.

TL;DR: This paper presents a new systematic approach for binary reassembling that is implemented in a tool called Ramblr and successfully reassembles most of the binaries, which is an improvement over the state-of-the-art approach.
Proceedings Article

Neural Nets Can Learn Function Type Signatures From Binaries

TL;DR: A new system called EKLAVYA which trains a recurrent neural network to recover function type signatures from disassembled binary code, which generalizes well across the compilers tested on two different instruction sets with various optimization levels, without any specialized prior knowledge of the instruction set, compiler or optimization level.
Proceedings ArticleDOI

SPAIN: security patch analysis for binaries towards understanding the pain and pills

TL;DR: A scalable binary-level patch analysis framework, named SPAIN, which can automatically identify security patches and summarize patch patterns and their corresponding vulnerability patterns and can be used to search similar patches or vulnerabilities in binary programs.
Proceedings ArticleDOI

RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization

TL;DR: RetroWrite is developed, a binary-rewriting instrumentation to support American Fuzzy Lop and Address Sanitizer and it is shown that it can achieve compiler-level performance while retaining precision and is identical in performance to compiler-instrumented binaries.
Proceedings ArticleDOI

Compiler-Agnostic Function Detection in Binaries

TL;DR: Nucleus is a novel function detection algorithm for binaries that is compiler-agnostic, and does not require any learning phase or signature information, making it inherently suitable for difficult cases such as non-contiguous or multi-entry functions.
References
More filters
Proceedings ArticleDOI

Obfuscation of executable code to improve resistance to static disassembly

TL;DR: Experimental results indicate that significant portions of executables that have been obfuscated using the techniques described are disassembled incorrectly, thereby showing the efficacy of the methods.
Proceedings ArticleDOI

Practical Control Flow Integrity and Randomization for Binary Executables

TL;DR: A new practical and realistic protection method called CCFIR (Compact Control Flow Integrity and Randomization), which addresses the main barriers to CFI adoption and can stop control-flow hijacking attacks including ROP and return-into-libc.
Proceedings Article

Control flow integrity for COTS binaries

TL;DR: This work demonstrates that the first work to apply CFI to complex shared libraries such as glibc is effective against control-flow hijack attacks, and eliminates the vast majority of ROP gadgets.
Book ChapterDOI

BAP: a binary analysis platform

TL;DR: BAP explicitly represents all side effects of instructions in an intermediate language (IL), making syntaxdirected analysis possible and used to routinely generate and solve verification conditions that are hundreds of megabytes in size and encompass 100,000's of assembly instructions.
Proceedings Article

Static disassembly of obfuscated binaries

TL;DR: Novel binary analysis techniques are presented that substantially improve the success of the disassembly process when confronted with obfuscated binaries, and a large fraction of the program's instructions can be correctly identified.
Related Papers (5)