What is the effect of clustering on performance?

Clustering dramatically reduces the number of degrees of freedom of the overall performance model; that is, clustering simplifies their presentation of program performance by dramatically reducing the number of program components whose costs the authors model.

What are the functions that can be used to allocate performance cost to different callers?

Contexts are useful for apportioning performance cost of a library to different callers or even costs of data structure operations to different instances of the data structure.

What is the way to mark function invocations?

CF-TrendProf allows the user to mark function invocations with contexts basedon the call graph or on arbitrary runtime data values.

What is the way to understand the relationship between performance and workload features?

the log-log scatter plot would seem to be more appropriate for understanding the relationship between performance and workload features since, unlike a linear-linear scatter plot, a constant relative error corresponds to constant distance.

What is the danger of allowing a model to grow gratuitously complex?

there is a danger that if the authors allow their models to grow gratuitously complex, that they will overfit the training data and not generalize to other data.

What is the mechanism to verify bounds?

The mechanism is to annotate each function with types that describe how many steps the function takes to compute its result and use a dependent type system to verify these bounds.

How would a user realize an ideal situation in a larger algorithm?

In order to realize such an ideal situation in the context of a larger algorithm, a user would likely have to provide CF-TrendProf with suitable context annotations and invocation features.

What is the effect of the CF-TrendProf on the performance of the benchmark?

This dependence of performance on such subtle properties means that the apparent scalability of an algorithm that CF-TrendProf measures is as much a consequence of the code being measured, as it is of the empirical distribution of workloads.

What is the purpose of reducing the number of unknown entities?

In a setting where performance need not be a clean function of workload features, reducing the number of unknown entities is useful.

(Open Access) Measuring empirical computational complexity (2007) | Simon F. Goldsmith

Measuring Empirical Computational Complexity

Simon Fredrick Goldsmith

B.S. (Carnegie Mellon University) 2001

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

Computer Science

in the

GRADUATE DIVISION

of the

UNIVERSITY OF CALIFORNIA, BERKELEY

Committee in charge:

Professor Alex Aiken, Co-chair

Professor Koushik Sen, Co-chair

Professor Rastislav Bodik

Professor Dor Abrahamson

Fall 2008

The dissertation of Simon Fredrick Goldsmith is approved:

Co-chair Date

Date

University of California, Berkeley

Fall 2008

Measuring Empirical Computational Complexity

Simon Fredrick Goldsmith

Abstract

Measuring Empirical Computational Complexity

Simon Fredrick Goldsmith

Doctor of Philosophy in Computer Science

University of California, Berkeley

Professor Alex Aiken, Co-chair

Professor Koushik Sen, Co-chair

Scalability is a fundamental problem in computer science. Computer scientists

often describ e the scalability of algorithms in the language of theoretical computational

complexity, bounding the number of operations an algorithm performs as a function of the

size of its input. The main contribution of this dissertation is to provide an analogous

description of the scalability of actual software implementations run on realistic workloads.

We propose a metho d for describing the asymptotic behavior of programs in prac-

tice by measuring their empirical computational complexity. Our method involves running a

program on workloads spanning several orders of magnitude in size, measuring their perfor-

mance, and ﬁtting these observations to a model that predicts performance as a function of

workload size. Comparing these models to the programmer’s expectations or to theoretical

asymptotic bounds can reveal performance bugs or conﬁrm that a program’s performance

scales as expected.

We develop our methodology for constructing these models of empirical complexity

as we describe and evaluate two techniques. Our ﬁrst technique, BB-TrendProf, con-

structs models that predict how many times each basic block runs as a linear (y = a + bx)

or a powerlaw (y = ax

) function of user-speciﬁed features of the program’s workloads. To

present output succinctly and focus attention on scalability-critical code, BB-TrendProf

groups and ranks program locations based on these models. We demonstrate the power of

BB-TrendProf compared to existing tools by running it on several large programs and

reporting cases where its models show (1) an implementation of a complex algorithm scal-

ing as ex pected, (2) two complex algorithms beating their worst-case theoretical complexity

bounds when run on realistic inputs, and (3) a performance bug.

Our second technique, CF-TrendProf, models performance of loops and func-

tions both p er-function-invocation and per-workload. It improves upon the precision of

BB-TrendProf’s models by using control ﬂow to generate candidates from a richer fam-

ily of models and a novel model selection criteria to select among these candidates. We

show that CF-TrendProf’s improvements to model generation and selection allow it to

correctly characterize or closely approximate the empirical scalability of several well-known

algorithms and data structures and to diagnose several synthetic, but realistic, scalabil-

ity problems without observing an e gregiously expensive workload. We also show that

CF-TrendProf deals with multiple workload features better than BB-TrendProf. We

qualitatively compare the output of BB-TrendProf and CF-TrendProf and discuss

their relative strengths and weaknesses.

Measuring empirical computational complexity

Figures

Citations

SPEED: precise and efficient static estimation of program computational complexity

Control-flow refinement and progress invariants for bound analysis

PerfFuzz: automatically generating pathological inputs

Empirical hardness models: Methodology and a case study on combinatorial auctions

Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

References

Introduction to Algorithms

Compilers: Principles, Techniques, and Tools

Probability theory : the logic of science

Mathematical Statistics and Data Analysis

Accuracy and stability of numerical algorithms

Related Papers (5)

SPEED: precise and efficient static estimation of program computational complexity

Gprof: A call graph execution profiler

Toddler: detecting performance problems via similar memory-access patterns

WISE: Automated test generation for worst-case complexity

Understanding and detecting real-world performance bugs

Frequently Asked Questions (9)

Q1. What is the effect of clustering on performance?

Q2. What are the functions that can be used to allocate performance cost to different callers?

Q3. What is the way to mark function invocations?

Q4. What is the way to understand the relationship between performance and workload features?

Q5. What is the danger of allowing a model to grow gratuitously complex?

Q6. What is the mechanism to verify bounds?

Q7. How would a user realize an ideal situation in a larger algorithm?

Q8. What is the effect of the CF-TrendProf on the performance of the benchmark?

Q9. What is the purpose of reducing the number of unknown entities?