scispace - formally typeset
Open AccessProceedings ArticleDOI

Efficient mart-aided modeling for microarchitecture design space exploration and performance prediction

Bin Li, +2 more
- Vol. 36, Iss: 1, pp 439-440
TLDR
This paper proposes an automated performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining and produces highly accurate estimations for unsampled points in the design space, but also provides interpretation tools that help investigators to understand performance bottlenecks.
Abstract
Computer architects usually evaluate new designs by cycle-accurate processor simulation. This approach provides detailed insight into processor performance, power consumption and complexity. However, only configurations in a subspace can be simulated in practice due to long simulation time and limited resource, leading to suboptimal conclusions which might not be applied in a larger design space. In this paper, we propose an automated performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining. Our method not only produces highly accurate estimations for unsampled points in the design space, but also provides interpretation tools that help investigators to understand performance bottlenecks. According to our experiments, by sampling only 0.02% of the full design space with about 15 millions points, the median percentage errors, based on 5000 independent test points, range from 0.32% to 3.12% in 12 benchmarks. Even for the worst-case performance, the percentage errors are within 7% for 10 out of 12 benchmarks. In addition, the proposed model can also help architects to find important design parameters and performance bottlenecks.

read more

Content maybe subject to copyright    Report

Efficient MART-Aided Modeling for Microarchitecture
Design Space Exploration and Performance Prediction
Bin Li
Department of Experimental Statistics
Louisiana State University
Baton Rouge, LA 70803
bli@lsu.edu
Lu Peng
Electrical & Computer Engineering
Louisiana State University
Baton Rouge, LA 70803
lpeng@lsu.edu
Balachandran Ramadass
Electrical & Computer Engineering
Louisiana State University
Baton Rouge, LA 70803
bramad2@lsu.edu
ABSTRACT
Computer architects usually evaluate new designs by cycle-
accurate processor simulation. This approach provides detailed
insight into processor performance, power consumption and com-
plexity. However, only configurations in a subspace can be simu-
lated in practice due to long simulation time and limited resource,
leading to suboptimal conclusions which might not be applied in a
larger design space. In this paper, we propose an automated per-
formance prediction approach which employs state-of-the-art
techniques from experiment design, machine learning and data
mining. Our method not only produces highly accurate estima-
tions for unsampled points in the design space, but also provides
interpretation tools that help investigators to understand perform-
ance bottlenecks. According to our experiments, by sampling only
0.02% of the full design space with about 15 millions points, the
median percentage errors, based on 5000 independent test points,
range from 0.32% to 3.12% in 12 benchmarks. Even for the
worst-case performance, the percentage errors are within 7% for
10 out of 12 benchmarks. In addition, the proposed model can
also help architects to find important design parameters and per-
formance bottlenecks.
Categories and Subject Descriptors
C.4 [PERFORMANCE OF SYSTEMS] - Measurement tech-
niques; Modeling techniques.
General Terms
Measurement, Performance, Design, Experimentation.
Keywords
Design Space Exploration; Performance Prediction; MART-
Aided Models.
1. INTRODUCTION
Computer architects usually evaluate new designs by employing
cycle-accurate processor simulators which provide detailed in-
sight into processor performance, power consumption and com-
plexity. A huge design space is composed by the product of the
choices of many microarchitectural design parameters such proc-
essor frequency, issue width, cache size/latency, branch predictor
settings, etc. To achieve an optimal processor design, a wide con-
figuration spectrum of the design space has to be tested before
making a final decision. However, only configurations in a sub-
space can be simulated in practice due to long simulation time and
limited resource, leading to suboptimal conclusions which might
not be applied in the whole design space. In addition, more pa-
rameters brought by chip-multiprocessors make this problem
more urgent
[2][3].
In this paper, we propose to use the state-of-the-art tree-based
predictive modeling method combining with advanced sampling
techniques from statistics and machine learning to explore the
microarchitectural design space and predict the processor per-
formance. This bridges the gap between simulation requirements
and simulation time/resource costs. The proposed method in-
cludes the following four components: (1) the maximin space-
filling sampling method that selects the initial design representa-
tives from among a large amount of design alternatives; (2) the
state-of-the-art predictive modeling method Multiple Additive
Regression Trees (MART)
[1] that builds a nonparametric model
with exceptional accuracy while remaining remarkably robust; (3)
an active learning method that selects the most informative design
points needed to improve the prediction accuracy sequentially; (4)
interpretation tools for MART-fitted models that shows the im-
portance and partial dependence of design parameters.
According to our experiments on 12 SPEC benchmarks, by sam-
pling 3000 points drawn from a microarchitecture design space
with nearly 15 million configurations (sampled up to 0.02 percent
of the full design space) for each program, we can summarize the
following results:
1. Performance Prediction: Application-specific models pre-
dict performance, based on 5000 independent sampled de-
sign points, with median percentage error ranges from 0.32%
to 3.12% (average percentage error ranges from 0.41% to
4.18%).
2. Worst-Case Performance: the worst percentage errors are
within 7% for 10 out of 12 benchmarks. The largest worst
case percentage error of our proposed method is 22.55% for
art. This is still much better than that of linear regression
model which has a worst case percentage error 87.69%.
3. Model Interpretation: The proposed model shows that sev-
eral design factors are more important than others:
Fetch/Issue/Commit width and the number of ALU units, L2
cache size and branch predictor types and sizes. It also finds
a performance bottleneck resulting from a relative small
number of LSQ entries.
Copyright is held by the author/owner(s).
SIGMETRICS’ 08, June 2-6, 2008, Annapolis, Maryland, USA.
ACM 978-1-60558-005-0/08/06.

2. METHODOLOGY
In experiment design, the distance-based space-filling sampling
methods are popular, especially, when we believe that interesting
features of the true model are just as likely to be in one part of the
experimental region as another. Among them, the maximin dis-
tance design is commonly used. However, since some of the ar-
chitectural design parameters are nominal (no intrinsic ordering
structure) and the others are discrete (having a small number of
values), we use the following defined distance before applying the
maximin distance criterion. Let be the weight for the j
th
design parameter.
j
wt
()
(
[]
=
×=
p
j
jjj
xxIwtd
1
2121
, xx
)
where
(
)
parameterdesign for levels ofnumber log
2
th
j
jwt =
and
(
)
AI
is an indicator function, equal to one when A holds, other-
wise zero. Note that the weight for each design parameter is equal
to its information entropy with uniform probability for each of its
possible values.
In our method, a small number of initial design points are selected
based on the Maximin distance criterion (maximize the shortest
distance among selected points). The processor performance is
measured via benchmark simulations on the selected design
points. Then, MART is applied 20 times on the sampled points
with random perturbation. The reason to use MART, an ensemble
of trees, is the following: (1) trees are inherently nonparametric
and can handle mixed-type of input variables naturally, i.e. no
assumptions are made regarding the underlying distribution of
values of the input variables, as well as categorical predictors with
either ordinal or non-ordinal structure; (2) trees are adept at cap-
turing non-additive behavior, i.e. complex interactions among
predictors are routinely and automatically handled with relatively
little input required from the analyst; (3) MART improves the
prediction performance from a single tree by using an ensemble
of trees.
Adaptive sampling, also known as active learning in machine
learning literature, involves sequential sampling schemes that use
information gleaned from previous observations to guide the sam-
pling process. Studies have shown that adaptively selecting sam-
ples in order to learn a target function can outperform conven-
tional sampling schemes. In our method, for each of the MART-
fitted model, it predicts the rest of the points in the design space.
Sort these points according to the coefficient of variance (CoV,
the ratio of standard deviation to mean) for the model prediction.
Selected the points with maximal CoV (under minimal pairwise-
distance constrain) and measure their performance. Repeat the
underlined adaptive sampling process above until some stopping
criterion is met (e.g. time limit and user pre-specified number of
iterations).
3. EXPERIMENTAL RESULTS
We modified sim-outorder, the out-of-order pipelined simulator in
SimpleScalar, to be an eight-stage Alpha-21264 like pipeline.
Twelve (eight integer and four floating point) CPU and memory
intensive programs from SPEC2000 were selected. To show the
typical behavior, we skipped a number of instructions for each
SPEC program based on a previous work
[4]. Then we collected
the number of execution cycles for the next 100 million instruc-
tions. The total design space for each workload is about 15 mil-
lion configurations composed of the cross product of 13 design
parameter choices. For each workload, 500 initial design points
are sampled based on the maximin distance criterion described in
Section 2. Then another 500 points are sampled according to the
adaptive sampling scheme described in Section 2. Repeat the
sampling process until 3000 design points were sampled for each
benchmark. Notice that for 3000 points, we only explored ap-
proximately 0.02% of the total 15 million points in the design
space. An independent test set which consists of 5000 points is
used to evaluate the prediction performance of fitted models. The
following table shows the average percentage errors (PE) on
twelve benchmarks with roughly 0.0067%, 0.0133% and 0.02%
space sampled. The mean PE ranges from 0.41% to 4.18% for the
12 benchmarks. For the worst-case performance, the percentage
errors are within 7% for 10 out of 12 benchmarks. The results
indicate that our model achieves highly accurate prediction and
robustness under the worst-case situation.
Table 1: Summary of performance prediction error with specified
error and percentage of full space sampled. Max PE is the maxi-
mum percentage error among the 5000 test points.
4. REFERENCES
[1] J. Friedman, “Greedy function approximation: a gradient
boosting machine,” The Annals of Statistics, 29: 1189-1232,
2001.
[2] E. İpek, S.A. McKee, B.R. Supinski, “M. Schulz and R.
Caru-ana. Efficiently exploring architectural design spaces
via predictive modeling,” ASPLOS XII, Oct. 2006.
[3] B. Lee and D. Brooks, “Accurate and efficient regression
modeling for microarchitectural performance and power pre-
diction,” ASPLOS XII, Oct. 2006.
[4] S. Sair, M. Charney, "Memory Behavior of the SPEC2000
Benchmark Suit," Tech. Report, IBM Corp. Oct. 2000.
0.0067% 0.0133% 0.020%
Bench
mark
Mean
(%)
Max
(%)
Mean
(%)
Max
(%)
Mean
(%)
Max
(%)
art
6.299 42.79 4.633 24.95 4.179 22.55
bzip
0.734 4.496 0.460 3.328 0.406 3.165
crafty
1.623 13.10 1.018 7.171 0.865 5.529
equake
2.654 18.69 2.260 15.77 2.130 15.04
fma3d
0.912 5.426 0.704 3.362 0.625 2.964
gcc
0.740 4.044 0.491 3.024 0.426 2.256
mcf
0.668 4.988 0.501 4.217 0.456 4.236
parser
0.831 4.905 0.515 3.649 0.420 2.305
swim
1.442 9.588 0.905 5.937 0.659 4.627
twolf
1.826 10.35 1.380 7.533 1.227 6.315
vortex
1.359 13.07 0.925 7.112 0.800 6.885
vpr
0.983 6.929 0.616 4.533 0.529 4.323
Citations
More filters
Journal ArticleDOI

Accurate and efficient processor performance prediction via regression tree based modeling

TL;DR: A performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining is proposed which generates highly accurate estimations for unsampled points in the design space and shows the robustness for the worst-case prediction.
Journal ArticleDOI

Microarchitectural design space exploration made fast

TL;DR: The key of this approach is utilizing inherent program characteristics as prior knowledge (in addition to microarchitectural configurations) to build a universal predictive model, so that no additional simulation is required for evaluating new programs on new configurations.
Journal ArticleDOI

An adaptive sampling scheme guided by BART—with an application to predict processor performance

TL;DR: In this paper, the authors proposed an automated performance predictive approach, which employs an adaptive sampling scheme that interactively works with the predictive model to select samples for simulation, which are then used to build Bayesian additive regression trees, which in turn are used to predict the whole design space.
Journal ArticleDOI

Model guided adaptive design and analysis in computer experiment

TL;DR: An adaptive sampling scheme that interactively works with predictive models to sequentially select design points for computer experiments and gives more accurate predictions on the unsampled points than the models built on samples from other methods such as random sampling, space‐filling designs and some adaptive sampling methods.
References
More filters
Journal ArticleDOI

Greedy function approximation: A gradient boosting machine.

TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Proceedings ArticleDOI

Accurate and efficient regression modeling for microarchitectural performance and power prediction

TL;DR: This paper derives and validate regression models for performance and power, and presents optimizations for a baseline regression model to obtain application-specific models to maximize accuracy in performance prediction and regional power models leveraging only the most relevant samples from the microarchitectural design space to maximizing accuracy in power prediction.
Proceedings ArticleDOI

Efficiently exploring architectural design spaces via predictive modeling

TL;DR: This work builds accurate, confident predictive design-space models that produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "Efficient mart-aided modeling for microarchitecture design space exploration and performance prediction" ?

This approach provides detailed insight into processor performance, power consumption and complexity. In this paper, the authors propose an automated performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining. Their method not only produces highly accurate estimations for unsampled points in the design space, but also provides interpretation tools that help investigators to understand performance bottlenecks. 

In this paper, the authors propose to use the state-of-the-art tree-based predictive modeling method combining with advanced sampling techniques from statistics and machine learning to explore the microarchitectural design space and predict the processor performance. 

since some of the architectural design parameters are nominal (no intrinsic ordering structure) and the others are discrete (having a small number of values), the authors use the following defined distance before applying the maximin distance criterion. 

Adaptive sampling, also known as active learning in machine learning literature, involves sequential sampling schemes that use information gleaned from previous observations to guide the sampling process. 

In experiment design, the distance-based space-filling sampling methods are popular, especially, when the authors believe that interesting features of the true model are just as likely to be in one part of the experimental region as another. 

A huge design space is composed by the product of the choices of many microarchitectural design parameters such processor frequency, issue width, cache size/latency, branch predictor settings, etc. 

The reason to use MART, an ensemble of trees, is the following: (1) trees are inherently nonparametric and can handle mixed-type of input variables naturally, i.e. no assumptions are made regarding the underlying distribution of values of the input variables, as well as categorical predictors with either ordinal or non-ordinal structure; (2) trees are adept at capturing non-additive behavior, i.e. complex interactions among predictors are routinely and automatically handled with relatively little input required from the analyst; (3) MART improves the prediction performance from a single tree by using an ensemble of trees. 

jwt( ) ([ ]∑ =≠×= pj jjj xxIwtd 1 2121 , xx )where ( )parameterdesign for levels ofnumber log2 thj jwt = and ( )AI is an indicator function, equal to one when A holds, other-wise zero. 

The total design space for each workload is about 15 million configurations composed of the cross product of 13 design parameter choices. 

Note that the weight for each design parameter is equal to its information entropy with uniform probability for each of its possible values.