Valgrind: a framework for heavyweight dynamic binary instrumentation

doi:10.1145/1250734.1250746

Home
/
Papers
/
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings Article•DOI•

Valgrind: a framework for heavyweight dynamic binary instrumentation

10 Jun 2007-Vol. 42, Iss: 6, pp 89-100

TL;DR: Valgrind is described, a DBI framework designed for building heavyweight DBA tools that can be used to build more interesting, heavyweight tools that are difficult or impossible to build with other DBI frameworks such as Pin and DynamoRIO.

read less

Abstract: Dynamic binary instrumentation (DBI) frameworks make it easy to build dynamic binary analysis (DBA) tools such as checkers and profilers. Much of the focus on DBI frameworks has been on performance; little attention has been paid to their capabilities. As a result, we believe the potential of DBI has not been fully exploited.In this paper we describe Valgrind, a DBI framework designed for building heavyweight DBA tools. We focus on its unique support for shadow values-a powerful but previously little-studied and difficult-to-implement DBA technique, which requires a tool to shadow every register and memory value with another value that describes it. This support accounts for several crucial design features that distinguish Valgrind from other DBI frameworks. Because of these features, lightweight tools built with Valgrind run comparatively slowly, but Valgrind can be used to build more interesting, heavyweight tools that are difficult or impossible to build with other DBI frameworks such as Pin and DynamoRIO.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Twelve years of SAMtools and BCFtools.

[...]

Petr Danecek¹, James K. Bonfield¹, Jennifer Liddle¹, John Marshall², Valeriu Ohan¹, Martin O. Pollard¹, Andrew Whitwham¹, Thomas M. Keane³, Shane A. McCarthy¹, Robert L. Davies¹, Heng Li⁴ - Show less +7 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of Glasgow², European Bioinformatics Institute³, Harvard University⁴

01 Feb 2021-GigaScience

TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.

...read moreread less

Abstract: Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

...read moreread less

2,448 citations

Journal Article•DOI•

Fast and SNP-tolerant detection of complex variants and splicing in short reads

[...]

Thomas D. Wu¹, Serban Nacu¹•Institutions (1)

Genentech¹

01 Apr 2010-Bioinformatics

TL;DR: Computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index are presented.

...read moreread less

Abstract: Motivation: Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single-and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short-and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state. Results: In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of ≥70 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1–9 nt and deletions of 1–30 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7–8% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads. Availability: Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at http://share.gene.com/gmap. Contact: [email protected]

...read moreread less

1,958 citations

Cites methods from "Valgrind: a framework for heavyweig..."

...We measured the amount of heap memory used by the programs using the Valgrind Massif tool (Nethercote and Steward, 2007)....
[...]

Proceedings Article•

Automated Whitebox Fuzz Testing.

[...]

Patrice Godefroid¹, Michael Y. Levin¹, David Molnar²•Institutions (2)

Microsoft¹, University of California, Berkeley²

01 Nov 2008

TL;DR: This work presents an alternative whitebox fuzz testing approach inspired by recent advances in symbolic execution and dynamic test generation, and implemented this algorithm in SAGE (Scalable, Automated, Guided Execution), a new tool employing x86 instruction-level tracing and emulation for white box fuzzing of arbitrary file-reading Windows applications.

...read moreread less

Abstract: Fuzz testing is an effective technique for finding security vulnerabilities in software Traditionally, fuzz testing tools apply random mutations to well-formed inputs of a program and test the resulting values We present an alternative whitebox fuzz testing approach inspired by recent advances in symbolic execution and dynamic test generation Our approach records an actual run of the program under test on a well-formed input, symbolically evaluates the recorded trace, and gathers constraints on inputs capturing how the program uses these The collected constraints are then negated one by one and solved with a constraint solver, producing new inputs that exercise different control paths in the program This process is repeated with the help of a code-coverage maximizing heuristic designed to find defects as fast as possible We have implemented this algorithm in SAGE (Scalable, Automated, Guided Execution), a new tool employing x86 instruction-level tracing and emulation for whitebox fuzzing of arbitrary file-reading Windows applications We describe key optimizations needed to make dynamic test generation scale to large input files and long execution traces with hundreds of millions of instructions We then present detailed experiments with several Windows applications Notably, without any format-specific knowledge, SAGE detects the MS07-017 ANI vulnerability, which was missed by extensive blackbox fuzzing and static analysis tools Furthermore, while still in an early stage of development, SAGE has already discovered 30+ new bugs in large shipped Windows applications including image processors, media players, and file decoders Several of these bugs are potentially exploitable memory access violations

...read moreread less

1,221 citations

Cites methods from "Valgrind: a framework for heavyweig..."

...With online generation, constraints are generated as the program is executed either by statically injected instrumentation code or with the help of dynamic binary instrumentation tools such as Nirvana [3] orValgrind [27] (Catchconv is an example of the latter approach [24].)...
[...]
...With online generation, constraints are generated as the program is executed either by statically injected instrumentation code or with the help of dynamic binary instrumentation tools such as Nirvana [3] or Valgrind [25] (Catchconv is an example of the latter approach [22]....
[...]

Proceedings Article•

AddressSanitizer: a fast address sanity checker

[...]

Konstantin Serebryany¹, Derek Bruening¹, Alexander Potapenko¹, Dmitriy Vyukov¹•Institutions (1)

Google¹

13 Jun 2012

TL;DR: The paper presents AddressSanitizer, a new memory error detector that achieves efficiency without sacrificing comprehensiveness, and has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software.

...read moreread less

Abstract: Memory access bugs, including buffer overflows and uses of freed heap memory, remain a serious problem for programming languages like C and C++. Many memory error detectors exist, but most of them are either slow or detect a limited set of bugs, or both. This paper presents AddressSanitizer, a new memory error detector. Our tool finds out-of-bounds accesses to heap, stack, and global objects, as well as use-after-free bugs. It employs a specialized memory allocator and code instrumentation that is simple enough to be implemented in any compiler, binary translation system, or even in hardware. AddressSanitizer achieves efficiency without sacrificing comprehensiveness. Its average slowdown is just 73% yet it accurately detects bugs at the point of occurrence. It has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software.

...read moreread less

795 citations

Cites background from "Valgrind: a framework for heavyweig..."

...Typically an application address is mapped to a shadow address either by a direct scale and offset, where the full application address space is mapped to a single shadow address space, or by extra levels of translation involving table lookups....
[...]
...Memory access bugs, including buffer overflows and uses of freed heap memory, remain a serious problem for programming languages like C and C++....
[...]

Proceedings Article•DOI•

Driller: Augmenting Fuzzing Through Selective Symbolic Execution.

[...]

Nick Stephens¹, John Grosen¹, Christopher Salls¹, Andrew Dutcher¹, Ruoyu Wang¹, Jacopo Corbetta¹, Yan Shoshitaishvili¹, Christopher Kruegel¹, Giovanni Vigna¹ - Show less +5 more•Institutions (1)

University of California, Santa Barbara¹

01 Jan 2016

TL;DR: Driller is presented, a hybrid vulnerability excavation tool which leverages fuzzing and selective concolic execution in a complementary manner, to find deeper bugs and mitigate their weaknesses, avoiding the path explosion inherent in concolic analysis and the incompleteness of fuzzing.

...read moreread less

Abstract: Memory corruption vulnerabilities are an everpresent risk in software, which attackers can exploit to obtain unauthorized access to confidential information. As products with access to sensitive data are becoming more prevalent, the number of potentially exploitable systems is also increasing, resulting in a greater need for automated software vetting tools. DARPA recently funded a competition, with millions of dollars in prize money, to further research focusing on automated vulnerability finding and patching, showing the importance of research in this area. Current techniques for finding potential bugs include static, dynamic, and concolic analysis systems, which each having their own advantages and disadvantages. A common limitation of systems designed to create inputs which trigger vulnerabilities is that they only find shallow bugs and struggle to exercise deeper paths in executables. We present Driller, a hybrid vulnerability excavation tool which leverages fuzzing and selective concolic execution in a complementary manner, to find deeper bugs. Inexpensive fuzzing is used to exercise compartments of an application, while concolic execution is used to generate inputs which satisfy the complex checks separating the compartments. By combining the strengths of the two techniques, we mitigate their weaknesses, avoiding the path explosion inherent in concolic analysis and the incompleteness of fuzzing. Driller uses selective concolic execution to explore only the paths deemed interesting by the fuzzer and to generate inputs for conditions that the fuzzer cannot satisfy. We evaluate Driller on 126 applications released in the qualifying event of the DARPA Cyber Grand Challenge and show its efficacy by identifying the same number of vulnerabilities, in the same time, as the top-scoring team of the qualifying event.

...read moreread less

778 citations

Cites background from "Valgrind: a framework for heavyweig..."

...When the fuzzing component has gone through a predetermined amount (proportional to the input length) of mutations without identifying new state transitions, we consider it “stuck”....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Pin: building customized program analysis tools with dynamic instrumentation

[...]

Chi-Keung Luk¹, Robert Cohn¹, Robert Muth¹, Harish Patil¹, Artur Klauser¹, Geoff Lowney¹, Steven Wallace¹, Vijay Janapa Reddi², Kim Hazelwood¹ - Show less +5 more•Institutions (2)

Intel¹, University of Colorado Boulder²

12 Jun 2005

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.

...read moreread less

Abstract: Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin's rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application's original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin's versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium®, and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website.

...read moreread less

4,019 citations

Proceedings Article•

Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software

[...]

James Newsome¹, Dawn Song¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2005

TL;DR: TaintCheck as mentioned in this paper performs dynamic taint analysis by performing binary rewriting at run time, which can reliably detect most types of exploits and produces no false positives for any of the many different programs that were tested.

...read moreread less

Abstract: Software vulnerabilities have had a devastating effect on the Internet. Worms such as CodeRed and Slammer can compromise hundreds of thousands of hosts within hours or even minutes, and cause millions of dollars of damage [26, 43]. To successfully combat these fast automatic Internet attacks, we need fast automatic attack detection and filtering mechanisms. In this paper we propose dynamic taint analysis for automatic detection of overwrite attacks, which include most types of exploits. This approach does not need source code or special compilation for the monitored program, and hence works on commodity software. To demonstrate this idea, we have implemented TaintCheck, a mechanism that can perform dynamic taint analysis by performing binary rewriting at run time. We show that TaintCheck reliably detects most types of exploits. We found that TaintCheck produced no false positives for any of the many different programs that we tested. Further, we describe how TaintCheck could improve automatic signature generation in

...read moreread less

1,557 citations

Book•

Programming Perl

[...]

Larry Wall, Mike Loukides

01 Jan 1991

TL;DR: This third edition of Programming Perl has been expanded to cover version 5.6 of this maturing language, and new topics include threading, the compiler, Unicode, and other new features that have been added since the previous edition.

...read moreread less

Abstract: From the Publisher: Perl is a powerful programming language that has grown in popularity since it first appeared in 1988. The first edition of this book, Programming Perl, hit the shelves in 1990, and was quickly adopted as the undisputed bible of the language. Since then, Perl has grown with the times, and so has this book. Programming Perl is not just a book about Perl. It is also a unique introduction to the language and its culture, as one might expect only from its authors. Larry Wall is the inventor of Perl, and provides a unique perspective on the evolution of Perl and its future direction. Tom Christiansen was one of the first champions of the language, and lives and breathes the complexities of Perl internals as few other mortals do. Jon Orwant is the editor of The Perl Journal, which has brought together the Perl community as a common forum for new developments in Perl. Any Perl book can show the syntax of Perl's functions, but only this one is a comprehensive guide to all the nooks and crannies of the language. Any Perl book can explain typeglobs, pseudohashes, and closures, but only this one shows how they really work. Any Perl book can say that my is faster than local, but only this one explains why. Any Perl book can have a title, but only this book is affectionately known by all Perl programmers as The Camel. This third edition of Programming Perl has been expanded to cover version 5.6 of this maturing language. New topics include threading, the compiler, Unicode, and other new features that have been added since the previous edition.

...read moreread less

1,086 citations

"Valgrind: a framework for heavyweig..." refers methods in this paper

...For example, Perl’s “taint mode” [29] and Patil and Fischer’s bounds checker for C [21] implement analyses similar to those of TaintCheck and Annelid (see Section 1) at the level of source code....
[...]

Journal Article•DOI•

Dynamo: a transparent dynamic optimization system

[...]

Vasanth Bala¹, Evelyn Duesterwald¹, Sanjeev Banerjia¹•Institutions (1)

Hewlett-Packard¹

01 May 2000

TL;DR: The design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor, are described and evaluated.

...read moreread less

Abstract: We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT for example), or it can come from the execution of a statically compiled native binary. This paper evaluates the Dynamo system in the latter, more challenging situation, in order to emphasize the limits, rather than the potential, of the system. Our experiments demonstrate that even statically optimized native binaries can be accelerated Dynamo, and often by a significant degree. For example, the average performance of -O optimized SpecInt95 benchmark binaries created by the HP product C compiler is improved to a level comparable to their -O4 optimized version running without Dynamo. Dynamo achieves this by focusing its efforts on optimization opportunities that tend to manifest only at runtime, and hence opportunities that might be difficult for a static compiler to exploit. Dynamo's operation is transparent in the sense that it does not depend on any user annotations or binary instrumentation, and does not require multiple runs, or any special compiler, operating system or hardware support. The Dynamo prototype presented here is a realistic implementation running on an HP PA-8000 workstation under the HPUX 10.20 operating system.

...read moreread less

935 citations

Proceedings Article•DOI•

An infrastructure for adaptive dynamic optimization

[...]

Derek L. Bruening¹, Timothy Garnett¹, Saman Amarasinghe¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Mar 2003

TL;DR: This work provides an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions.

...read moreread less

Abstract: Dynamic optimization is emerging as a promising approach to overcome many of the obstacles of traditional static compilation. But while there are a number of compiler infrastructures for developing static optimizations, there are very few for developing dynamic optimizations. We present a framework for implementing dynamic analyses and optimizations. We provide an interface for building external modules, or clients, for the DynamoRIO dynamic code modification system. This interface abstracts away many low-level details of the DynamoRIO runtime system while exposing a simple and powerful, yet efficient and lightweight API. This is achieved by restricting optimization units to linear streams of code and using adaptive levels of detail for representing instructions. The interface is not restricted to optimization and can be used for instrumentation, profiling, dynamic translation, etc. To demonstrate the usefulness and effectiveness of our framework, we implemented several optimizations. These improve the performance of some applications by as much as 40% relative to native execution. The average speedup relative to base DynamoRIO performance is 12%.

...read moreread less

523 citations