scispace - formally typeset

Journal ArticleDOI

Pin: building customized program analysis tools with dynamic instrumentation

12 Jun 2005-Vol. 40, Iss: 6, pp 190-200

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Abstract: Robust and powerful software instrumentation tools are essential for program analysis tasks such as profiling, performance evaluation, and bug detection. To meet this need, we have developed a new instrumentation system called Pin. Our goals are to provide easy-to-use, portable, transparent, and efficient instrumentation. Instrumentation tools (called Pintools) are written in C/C++ using Pin's rich API. Pin follows the model of ATOM, allowing the tool writer to analyze an application at the instruction level without the need for detailed knowledge of the underlying instruction set. The API is designed to be architecture independent whenever possible, making Pintools source compatible across different architectures. However, a Pintool can access architecture-specific details when necessary. Instrumentation with Pin is mostly transparent as the application and Pintool observe the application's original, uninstrumented behavior. Pin uses dynamic compilation to instrument executables while they are running. For efficiency, Pin uses several techniques, including inlining, register re-allocation, liveness analysis, and instruction scheduling to optimize instrumentation. This fully automated approach delivers significantly better instrumentation performance than similar tools. For example, Pin is 3.3x faster than Valgrind and 2x faster than DynamoRIO for basic-block counting. To illustrate Pin's versatility, we describe two Pintools in daily use to analyze production software. Pin is publicly available for Linux platforms on four architectures: IA32 (32-bit x86), EM64T (64-bit x86), Itanium®, and ARM. In the ten months since Pin 2 was released in July 2004, there have been over 3000 downloads from its website.
Citations
More filters

Proceedings ArticleDOI
Shuai Che1, Michael Boyer1, Jiayuan Meng1, David Tarjan1  +3 moreInstitutions (1)
04 Oct 2009-
TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Abstract: This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

2,406 citations


Cites methods from "Pin: building customized program an..."

  • ...While developing and characterizing these benchmarks, we have experienced first-hand the following challenges of the GPU platform: Data Structure Mapping: Programmers must find efficient mappings of their applications’ data structures to CUDA’s hierarchical (grid of thread blocks) domain model....

    [...]


Proceedings ArticleDOI
10 Jun 2007-
TL;DR: Valgrind is described, a DBI framework designed for building heavyweight DBA tools that can be used to build more interesting, heavyweight tools that are difficult or impossible to build with other DBI frameworks such as Pin and DynamoRIO.
Abstract: Dynamic binary instrumentation (DBI) frameworks make it easy to build dynamic binary analysis (DBA) tools such as checkers and profilers. Much of the focus on DBI frameworks has been on performance; little attention has been paid to their capabilities. As a result, we believe the potential of DBI has not been fully exploited.In this paper we describe Valgrind, a DBI framework designed for building heavyweight DBA tools. We focus on its unique support for shadow values-a powerful but previously little-studied and difficult-to-implement DBA technique, which requires a tool to shadow every register and memory value with another value that describes it. This support accounts for several crucial design features that distinguish Valgrind from other DBI frameworks. Because of these features, lightweight tools built with Valgrind run comparatively slowly, but Valgrind can be used to build more interesting, heavyweight tools that are difficult or impossible to build with other DBI frameworks such as Pin and DynamoRIO.

2,346 citations


Proceedings ArticleDOI
Sheng Li1, Jung Ho Ahn2, Richard Strong3, Jay B. Brockman1  +2 moreInstitutions (4)
12 Dec 2009-
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Abstract: This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

2,287 citations


Cites methods from "Pin: building customized program an..."

  • ...[17] A. Jain, et al., “A 1.2 GHz Alpha Microprocessor with 44.8 GB/s Chip Pin Bandwidth,” in ISSCC, 2001....

    [...]

  • ...[26] C.-K. Luk, et al., “Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation,” in PLDI, Jun 2005....

    [...]

  • ...We modified a user-level thread library [34] in the Pin [26] binary instrumentation tool to support more pthread APIs, and use it as a functional simulator to run applications....

    [...]


Kai Li1, Christian Bienia1Institutions (1)
01 Jan 2011-
TL;DR: A methodology to design effective benchmark suites is developed and its effectiveness is demonstrated by developing and deploying a benchmark suite for evaluating multiprocessors called PARSEC, which has been adopted by many architecture groups in both research and industry.
Abstract: Benchmarking has become one of the most important methods for quantitative performance evaluation of processor and computer system designs. Benchmarking of modern multiprocessors such as chip multiprocessors is challenging because of their application domain, scalability and parallelism requirements. In my thesis, I have developed a methodology to design effective benchmark suites and demonstrated its effectiveness by developing and deploying a benchmark suite for evaluating multiprocessors. More specifically, this thesis includes several contributions. First, the thesis shows that a new benchmark suite for multiprocessors is needed because the behavior of modern parallel programs is significantly different from those represented by SPLASH-2, the most popular parallel benchmark suite developed over ten years ago. Second, the thesis quantitatively describes the requirements and characteristics of a set of multithreaded programs and their underlying technology trends. Third, the thesis presents a systematic approach to scale and select benchmark inputs with the goal of optimizing benchmarking accuracy subject to constrained execution or simulation time. Finally, the thesis describes a parallel benchmark suite called PARSEC for evaluating modern shared-memory multiprocessors. Since its initial release, PARSEC has been adopted by many architecture groups in both research and industry.

990 citations


Cites background or methods from "Pin: building customized program an..."

  • ...2 Experimental Setup The data was obtained with Pin [58]....

    [...]

  • ...CMP$im is a plug-in for Pin [58] that simulates the cache hierarchy of a CMP....

    [...]


Proceedings ArticleDOI
Joel Coburn1, Adrian M. Caulfield1, Akel Ameen D1, Laura M. Grupp1  +3 moreInstitutions (1)
05 Mar 2011-
TL;DR: A lightweight, high-performance persistent object system called NV-heaps is implemented that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about.
Abstract: Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.

781 citations


Cites methods from "Pin: building customized program an..."

  • ...The system usesPin[39]to performa detailed simulation of the system s memory hierarchy augmented with non-volatile memory technology and the atomicity support that NV-heaps require....

    [...]

  • ...Pin: building customized pro­gram analysis tools with dynamic instrumentation....

    [...]

  • ...The system uses Pin [39] to perform a detailed simulation of the system’s memory hierarchy augmented with non-volatile memory technology and the atomicity support that NV-heaps require....

    [...]


References
More filters

Proceedings ArticleDOI
01 Oct 2002-
TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.
Abstract: Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research.

1,651 citations


Proceedings ArticleDOI
01 Jun 1994-
Abstract: ATOM (Analysis Tools with OM) is a single framework for building a wide range of customized program analysis tools. It provides the common infrastructure present in all code-instrumenting tools; this is the difficult and time-consuming part. The user simply defines the tool-specific details in instrumentation and analysis routines. Building a basic block counting tool like Pixie with ATOM requires only a page of code.ATOM, using OM link-time technology, organizes the final executable such that the application program and user's analysis routines run in the same address space. Information is directly passed from the application program to the analysis routines through simple procedure calls instead of inter-process communication or files on disk. ATOM takes care that analysis routines do not interfere with the program's execution, and precise information about the program is presented to the analysis routines at all times. ATOM uses no simulation or interpretation.ATOM has been implemented on the Alpha AXP under OSF/1. It is efficient and has been used to build a diverse set of tools for basic block counting, profiling, dynamic memory recording, instruction and data cache simulation, pipeline simulation, evaluating branch prediction, and instruction scheduling.

979 citations


Journal ArticleDOI
01 Jul 2000-IEEE Computer
Abstract: As computers and software have become more powerful, it seems almost human nature to want the biggest and fastest toy you can afford. But how do you know if your toy is tops? Even if your application never does any I/O, it's not just the speed of the CPU that dictates performance. Cache, main memory, and compilers also play a role. Software applications also have differing performance requirements. So whom do you trust to provide this information? The Standard Performance Evaluation Corporation (SPEC) is a nonprofit consortium whose members include hardware vendors, software vendors, universities, customers, and consultants. SPEC's mission is to develop technically credible and objective component- and system-level benchmarks for multiple operating systems and environments, including high-performance numeric computing, Web servers, and graphical subsystems. On 30 June 2000, SPEC retired the CPU95 benchmark suite. Its replacement is CPU2000, a new CPU benchmark suite with 19 applications that have never before been in a SPEC CPU suite. The article discusses how SPEC developed this benchmark suite and what the benchmarks do.

858 citations


"Pin: building customized program an..." refers methods in this paper

  • ...We ran the SPEC2000 suite [11] using reference inputs on IA32, EM64T, and Itanium....

    [...]


Journal ArticleDOI
01 Nov 2000-
TL;DR: The authors present a postcompiler program manipulation tool called Dyninst, which provides a C++ class library for program instrumentation that permits machine-independent binary instrumentation programs to be written.
Abstract: The authors present a postcompiler program manipulation tool called Dyninst, which provides a C++ class library for program instrumentation. Using this library, it is possible to instrument and modify application programs during execution. A unique feature of this library is that it permits machine-independent binary instrumentation programs to be written. The authors describe the interface that a tool sees when using this library. They also discuss three simple tools built using this interface: a utility to count the number of times a function is called, a program to capture the output of an already running program to a file, and an implementation of conditional breakpoints. For the conditional breakpoint example, the authors show that by using their interface compared with gdb, they are able to execute a program with conditional breakpoints up to 900 times faster.

615 citations


"Pin: building customized program an..." refers methods in this paper

  • ...Example probe-ba s d systems include Dyninst [7], Vulcan [29], and DTrace [9]....

    [...]

  • ...Example probe-based systems include Dyninst [7], Vulcan [29], and DTrace [9]....

    [...]


Proceedings Article
27 Jun 2004-
TL;DR: DTrace features the ability to dynamically instrument both user-level and kernel-level software in a unified and absolutely safe fashion and features a C-like high-level control language to describe the predicates and actions at a given point of instrumentation.
Abstract: This paper presents DTrace, a new facility for dynamic instrumentation of production systems. DTrace features the ability to dynamically instrument both user-level and kernel-level software in a unified and absolutely safe fashion. When not explicitly enabled, DTrace has zero. probe effect--the system operates exactly as if DTrace were not present at all. DTrace allows for many tens of thousands of instrumentation points, with even the smallest of systems offering on the order of 30,000 such points in the kernel alone. We have developed a C-like high-level control language to describe the predicates and actions at a given point of instrumentation. The language features user-defined variables, including thread-local variables and associative arrays. To eliminate the need for most postprocessing, the facility features a scalable mechanism for aggregating data and a mechanism for speculative tracing. DTrace has been integrated into the Solaris operating system and has been used to find serious systemic performance problems on production systems-problems that could not be found using pre-existing facilities.

502 citations


"Pin: building customized program an..." refers methods in this paper

  • ...Example probe-ba s d systems include Dyninst [7], Vulcan [29], and DTrace [9]....

    [...]

  • ...Example probe-based systems include Dyninst [7], Vulcan [29], and DTrace [9]....

    [...]


Network Information
Related Papers (5)
10 Jun 2007

Nicholas Nethercote, Julian Seward

25 Oct 2008

Christian Bienia, Sanjeev Kumar +2 more

10 Apr 2005

Fabrice Bellard

01 May 1995

Steven Cameron Woo, Moriyoshi Ohara +3 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
2021124
2020226
2019272
2018297
2017323
2016313