Showing papers by "Jeffrey Dean published in 1997"

PDF

Open Access

Journal Article•DOI•

Continuous profiling: where have all the cycles gone?

[...]

Jennifer M. Anderson, Lance M. Berc, Jeffrey Dean, Sanjay Ghemawat, Monika Henzinger, Shun-Tak Albert Leung, Richard L. Sites, Mark T. Vandevoorde, Carl A. Waldspurger, William E. Weihl - Show less +6 more

01 Oct 1997

TL;DR: The Digital Continuous Profiling Infrastructure is a sampling-based profiling system designed to run continuously on production systems, supporting multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel.

...read moreread less

Abstract: This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.

...read moreread less

545 citations

Proceedings Article•DOI•

ProfileMe: hardware support for instruction-level profiling on out-of-order processors

[...]

Jeffrey Dean, James W. Hicks, Carl A. Waldspurger, William E. Weihl, George Z. Chrysos - Show less +1 more

01 Dec 1997

TL;DR: An inexpensive hardware implementation of ProfileMe is described, a variety of software techniques to extract useful profile information from the hardware are outlined, and several ways in which this information can provide valuable feedback for programmers and optimizers are explained.

...read moreread less

Abstract: Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and branch mispredictions, and cannot accurately attribute these events to instructions, especially on out-of-order machines. We propose an alternative approach, called ProfileMe, that samples instructions. As a sampled instruction moves through the processor pipeline, a detailed record of all interesting events and pipeline stage latencies is collected. ProfileMe also supports paired sampling, which captures information about the interactions between concurrent instructions, revealing information about useful concurrency and the utilization of various pipeline stages while an instruction is in flight. We describe an inexpensive hardware implementation of ProfileMe, outline a variety of software techniques to extract useful profile information from the hardware, and explain several ways in which this information can provide valuable feedback for programmers and optimizers.

...read moreread less

338 citations

Proceedings Article•DOI•

Call graph construction in object-oriented languages

[...]

David Grove¹, Greg DeFouw¹, Jeffrey Dean¹, Craig Chambers¹•Institutions (1)

University of Washington¹

09 Oct 1997

TL;DR: In this paper, a parameterized algorithmic framework for call graph construction in the presence of message sends and/or first class functions is presented, which is used to describe and implement a number of well-known and new algorithms.

...read moreread less

Abstract: Interprocedural analyses enable optimizing compilers to more precisely model the effects of non-inlined procedure calls, potentially resulting in substantial increases in application performance Applying interprocedural analysis to programs written in object-oriented or functional languages is complicated by the difficulty of constructing an accurate program call graph This paper presents a parameterized algorithmic framework for call graph construction in the presence of message sends and/or first class functions We use this framework to describe and to implement a number of well-known and new algorithms We then empirically assess these algorithms by applying them to a suite of medium-sized programs written in Cecil and Java, reporting on the relative cost of the analyses, the relative precision of the constructed call graphs, and the impact of this precision on the effectiveness of a number of interprocedural optimizations

...read moreread less

338 citations

Patent•

Method for scheduling threads in a multithreaded processor

[...]

George Z. Chrysos, Jeffrey Dean, James W. Hicks, Carl A. Waldspurger, William E. Weihl - Show less +1 more

26 Nov 1997

TL;DR: In this article, a method for scheduling execution of a plurality of threads executed in a multithreaded processor is presented. But the method is limited to a single thread and it is not suitable for multi-threaded systems.

...read moreread less

Abstract: A method is provided for scheduling execution of a plurality of threads executed in a multithreaded processor. Resource utilizations of each of the plurality of threads are measured while the plurality of threads are concurrently executing in the multithreaded processor. Each of the plurality of threads is scheduled according to the measured resource utilizations using a thread scheduler.

...read moreread less

169 citations

Patent•

Method for estimating execution rates of program execution paths

[...]

Jeffrey Dean, Robert A. Eustace, James W. Hicks¹, Carl A. Waldspurger¹, William E. Weihl¹ - Show less +1 more•Institutions (1)

Hewlett-Packard¹

26 Nov 1997

TL;DR: In this article, a method for estimating execution rates of program executions paths is presented, based on path-identifying state information of selected instructions while executing the program in a processor.

...read moreread less

Abstract: A method is provided for estimating execution rates of program executions paths. The method samples path-identifying state information of selected instructions while executing the program in a processor. A control flow graph of the program is supplied, the control flow graph includes a plurality of path segments. The control flow graph is analyzed using the path-identifying state information to identify a set of path segments that are consistent with the sampled state information. The set of paths segments can be counted to determine their relative execution frequencies.

...read moreread less

148 citations

Patent•

Method for scheduling contexts based on statistics of memory system interactions in a computer system

[...]

Jeffrey Dean, Carl A. Waldspurger

26 Nov 1997

TL;DR: In this paper, a method for scheduling execution contexts in a computer system based on memory interactions is proposed, where a processor and a hierarchical memory are arranged in a plurality of levels.

...read moreread less

Abstract: A method schedules execution contexts in a computer system based on memory interactions. The computer system includes a processor and a hierarchical memory arranged in a plurality of levels. Memory transactions are randomly sampled for a plurality of contexts. The contexts can be threads, processes, or hardware contexts. Resource interactions of the plurality of contexts is estimated, and particular contexts are chosen to be scheduled based on the estimated resource interactions.

...read moreread less

143 citations

Patent•

Method for inserting memory prefetch operations based on measured latencies in a program optimizer

[...]

Jennifer M. Anderson, Jeffrey Dean, James W. Hicks, Carl A. Waldspurger, William E. Weihl - Show less +1 more

26 Nov 1997

TL;DR: In this article, a method for optimizing a program by inserting memory prefetch operations in the program executing in a computer system is presented, where a program optimizer uses the measured latencies to estimate the number of cycles that elapse before data of a memory operation are available.

...read moreread less

Abstract: A method is provided for optimizing a program by inserting memory prefetch operations in the program executing in a computer system. The computer system includes a processor and a memory. Latencies of instructions of the program are measured by hardware while the instructions are processed by a pipeline of the processor. Memory prefetch instructions are automatically inserted in the program based on the measured latencies to optimize execution of the program. The latencies measure the time from when a load instructions issues a request for data to the memory until the data are available in the processor. A program optimizer uses the measured latencies to estimate the number of cycles that elapse before data of a memory operation are available.

...read moreread less

89 citations

Patent•

Method for measuring latencies by randomly selected sampling of the instructions while the instruction are executed

[...]

Jennifer M. Anderson, Jeffrey Dean, James W. Hicks, Carl A. Waldspurger, William E. Weihl - Show less +1 more

26 Nov 1997

TL;DR: In this article, a method for scheduling instructions executed in a computer system including a processor and a memory subsystem, pipeline latencies and resource utilization are measured by sampling hardware while the instructions are executing.

...read moreread less

Abstract: In a method for scheduling instructions executed in a computer system including a processor and a memory subsystem, pipeline latencies and resource utilization are measured by sampling hardware while the instructions are executing. The instructions are then scheduled according to the measured latencies and resource utilizations using an instruction scheduler.

...read moreread less

85 citations

Patent•

Apparatus for sampling instruction execution information in a processor pipeline

[...]

George Z. Chrysos, Jeffrey Dean, James W. Hicks, Carl A. Waldspurger¹, William E. Weihl¹, Daniel L. Leibholz¹, Edward J. Mclellan¹ - Show less +3 more•Institutions (1)

Hewlett-Packard¹

26 Nov 1997

TL;DR: In this article, an apparatus is provided for sampling instructions in a processor pipeline of a computer system, where instructions are fetched into a first stage of the pipeline and a subset of the fetched instructions are identified as selected instructions.

...read moreread less

Abstract: An apparatus is provided for sampling instructions in a processor pipeline of a computer system. The pipeline has a plurality of processing stages. Instructions are fetched into a first stage of the pipeline. A subset of the fetched instructions are identified as selected instructions. Event, latency, and state information of the system is sampled while any of the selected instructions are in any stage of the pipeline. Software is informed whenever any of the selected instructions leaves the pipeline to read the event and latency information.

...read moreread less

65 citations

Patent•

Method and apparatus for sampling multiple potentially concurrent instructions in a processor pipeline

[...]

George Z. Chrysos, Jeffrey Dean, James W. Hicks¹, Daniel L. Leibholz¹, Edward J. Mclellan¹, Carl A. Waldspurger¹, William E. Weihl¹ - Show less +3 more•Institutions (1)

Hewlett-Packard¹

26 Nov 1997

TL;DR: In this paper, an apparatus is provided for sampling multiple concurretly executing instructions in a processor pipeline of a system, where state information of the system is sampled while any of the selected instructions are in any stage of the pipeline.

...read moreread less

Abstract: An apparatus is provided for sampling multiple concurretly executing instructions in a processor pipeline of a system. The pipeline has a plurality of processing stages. The apparatus identifies multiple selected when the instructions are fetched into a first stage of the pipeline. A subset of the the multiple selected instructions to execute concurrently in the pipeline. State information of the system is sampled while any of the multiple selected instructions are in any stage of the pipeline. Software is informed whenever all of the selected instructions leave the pipeline so that the software can read any of the state information.

...read moreread less

63 citations

Patent•

Method for providing virtual memory to physical memory page mapping in a computer operating system that randomly samples state information

[...]

Jeffrey Dean, James W. Hicks, William E. Weihl¹•Institutions (1)

Hewlett-Packard¹

26 Nov 1997

TL;DR: In this article, a method for guiding virtual-to-physical mapping policies in a computer system including a processor and a memory is provided, where state information is randomly sampled from selected memory references in a stream of memory references issued by the processor to the memory.

...read moreread less

Abstract: A method is provided for guiding virtual-to-physical mapping policies in a computer system including a processor and a memory. State information is randomly sampled from selected memory references in a stream of memory references issued by the processor to the memory. Cache hit/miss status, translation-look-aside buffer hit/miss status, and effective virtual and physical memory addresses of the sampled memory references are recorded in a profile record. The recorded information is aggregated by virtual memory address, and a new virtual-to-physical mapping is choosen to reduce cache and translation-look-aside buffer miss rates.

...read moreread less

Patent•

Apparatus for determining the instantaneous average number of instructions processed

[...]

George Z. Chrysos, Jeffrey Dean, James W. Hicks, Carl A. Waldspurger¹, William E. Weihl - Show less +1 more•Institutions (1)

Hewlett-Packard¹

26 Nov 1997

TL;DR: In this paper, the average number of instructions entering a stage of a processor pipeline of a computer system during a clock cycle of the processor clock is calculated. But the number is not the same for all stages of the pipeline.

...read moreread less

Abstract: An apparatus is provided for determining an average number of instructions entering a stage of a processor pipeline of a computer system during a clock cycle of a processor clock. The number of instructions entering a particular stage of the pipeline are stored in a queue during each of a predetermined number (N) of clock cycles. The total number of instructions processed over the last P clock cycles is computed, where P is less than or equal to N. The total number of instructions processed is divided by the last P processor cycles to yield the instantaneous average number of instructions processed for each processor cycle. This average number of instructions processed is communicated to software.

...read moreread less

Patent•

Apparatus for sampling path history in a processor pipeline

[...]

George Z. Chrysos, Jeffrey Dean, Robert A. Eustace¹, James W. Hicks¹, Carl A. Waldspurger¹, William E. Weihl¹ - Show less +2 more•Institutions (1)

Hewlett-Packard¹

26 Nov 1997

TL;DR: In this article, an apparatus is provided for collecting state information associated with an execution path of recently processed instructions in a processor pipeline of a computer system, and a shift register stores a predetermined number of entries storing selected state information, which is simultaneously sampled along with additional state information about the instruction being executed at the time of sampling.

...read moreread less

Abstract: An apparatus is provided for collecting state information associated with an execution path of recently processed instructions in a processor pipeline of a computer system. The apparatus identifies a class of instructions to be sampled. Path-identifying state information of a currently processed instruction is sampled when the currently processed instruction belongs to the identified class of instructions. A shift register stores a predetermined number of entries storing selected state information, the shift register is simultaneously sampled along with additional state information about the instruction being executed at the time of sampling.

...read moreread less

ProJiZeMe: Hardware Support for Instruction-Level on Out-of-Order Processors

[...]

Jeffrey Dean, James W. Hicks, Carl A. Waldspurger, William E. Weihl, George Z. Chrysos - Show less +1 more

01 Jan 1997

...read moreread less

Abstract: Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor’s performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and branch mispredictions, and cannot accurately attribute these events to instructions, especially on out-of-order machines. We propose an altemative approach, called ProjileMe, that samples instructions. As a sampled instruction moves through the processor pipeline, a detailed record of all interesting events and pipeline stage latencies is collected. ProfileMe also support paired sumpling, which captures information about the interactions between concurrent instructions, revealing information about useful concurrency and the utilization of various pipeline stages while an instruction is in flight. We describe an inexpensive hardware implementation of ProfileMe, outline a variety of software techniques to extract useful profile information from the hardware, and explain several ways in which this information can provide valuable feedback for programmers and optimizers.

...read moreread less