scispace - formally typeset

Proceedings ArticleDOI

Efficient mart-aided modeling for microarchitecture design space exploration and performance prediction

02 Jun 2008-Vol. 36, Iss: 1, pp 439-440

TL;DR: This paper proposes an automated performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining and produces highly accurate estimations for unsampled points in the design space, but also provides interpretation tools that help investigators to understand performance bottlenecks.
Abstract: Computer architects usually evaluate new designs by cycle-accurate processor simulation. This approach provides detailed insight into processor performance, power consumption and complexity. However, only configurations in a subspace can be simulated in practice due to long simulation time and limited resource, leading to suboptimal conclusions which might not be applied in a larger design space. In this paper, we propose an automated performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining. Our method not only produces highly accurate estimations for unsampled points in the design space, but also provides interpretation tools that help investigators to understand performance bottlenecks. According to our experiments, by sampling only 0.02% of the full design space with about 15 millions points, the median percentage errors, based on 5000 independent test points, range from 0.32% to 3.12% in 12 benchmarks. Even for the worst-case performance, the percentage errors are within 7% for 10 out of 12 benchmarks. In addition, the proposed model can also help architects to find important design parameters and performance bottlenecks.
Topics: Design space exploration (60%), Performance prediction (56%), Microarchitecture (51%), Design of experiments (51%)

Content maybe subject to copyright    Report

Efficient MART-Aided Modeling for Microarchitecture
Design Space Exploration and Performance Prediction
Bin Li
Department of Experimental Statistics
Louisiana State University
Baton Rouge, LA 70803
bli@lsu.edu
Lu Peng
Electrical & Computer Engineering
Louisiana State University
Baton Rouge, LA 70803
lpeng@lsu.edu
Balachandran Ramadass
Electrical & Computer Engineering
Louisiana State University
Baton Rouge, LA 70803
bramad2@lsu.edu
ABSTRACT
Computer architects usually evaluate new designs by cycle-
accurate processor simulation. This approach provides detailed
insight into processor performance, power consumption and com-
plexity. However, only configurations in a subspace can be simu-
lated in practice due to long simulation time and limited resource,
leading to suboptimal conclusions which might not be applied in a
larger design space. In this paper, we propose an automated per-
formance prediction approach which employs state-of-the-art
techniques from experiment design, machine learning and data
mining. Our method not only produces highly accurate estima-
tions for unsampled points in the design space, but also provides
interpretation tools that help investigators to understand perform-
ance bottlenecks. According to our experiments, by sampling only
0.02% of the full design space with about 15 millions points, the
median percentage errors, based on 5000 independent test points,
range from 0.32% to 3.12% in 12 benchmarks. Even for the
worst-case performance, the percentage errors are within 7% for
10 out of 12 benchmarks. In addition, the proposed model can
also help architects to find important design parameters and per-
formance bottlenecks.
Categories and Subject Descriptors
C.4 [PERFORMANCE OF SYSTEMS] - Measurement tech-
niques; Modeling techniques.
General Terms
Measurement, Performance, Design, Experimentation.
Keywords
Design Space Exploration; Performance Prediction; MART-
Aided Models.
1. INTRODUCTION
Computer architects usually evaluate new designs by employing
cycle-accurate processor simulators which provide detailed in-
sight into processor performance, power consumption and com-
plexity. A huge design space is composed by the product of the
choices of many microarchitectural design parameters such proc-
essor frequency, issue width, cache size/latency, branch predictor
settings, etc. To achieve an optimal processor design, a wide con-
figuration spectrum of the design space has to be tested before
making a final decision. However, only configurations in a sub-
space can be simulated in practice due to long simulation time and
limited resource, leading to suboptimal conclusions which might
not be applied in the whole design space. In addition, more pa-
rameters brought by chip-multiprocessors make this problem
more urgent
[2][3].
In this paper, we propose to use the state-of-the-art tree-based
predictive modeling method combining with advanced sampling
techniques from statistics and machine learning to explore the
microarchitectural design space and predict the processor per-
formance. This bridges the gap between simulation requirements
and simulation time/resource costs. The proposed method in-
cludes the following four components: (1) the maximin space-
filling sampling method that selects the initial design representa-
tives from among a large amount of design alternatives; (2) the
state-of-the-art predictive modeling method Multiple Additive
Regression Trees (MART)
[1] that builds a nonparametric model
with exceptional accuracy while remaining remarkably robust; (3)
an active learning method that selects the most informative design
points needed to improve the prediction accuracy sequentially; (4)
interpretation tools for MART-fitted models that shows the im-
portance and partial dependence of design parameters.
According to our experiments on 12 SPEC benchmarks, by sam-
pling 3000 points drawn from a microarchitecture design space
with nearly 15 million configurations (sampled up to 0.02 percent
of the full design space) for each program, we can summarize the
following results:
1. Performance Prediction: Application-specific models pre-
dict performance, based on 5000 independent sampled de-
sign points, with median percentage error ranges from 0.32%
to 3.12% (average percentage error ranges from 0.41% to
4.18%).
2. Worst-Case Performance: the worst percentage errors are
within 7% for 10 out of 12 benchmarks. The largest worst
case percentage error of our proposed method is 22.55% for
art. This is still much better than that of linear regression
model which has a worst case percentage error 87.69%.
3. Model Interpretation: The proposed model shows that sev-
eral design factors are more important than others:
Fetch/Issue/Commit width and the number of ALU units, L2
cache size and branch predictor types and sizes. It also finds
a performance bottleneck resulting from a relative small
number of LSQ entries.
Copyright is held by the author/owner(s).
SIGMETRICS’ 08, June 2-6, 2008, Annapolis, Maryland, USA.
ACM 978-1-60558-005-0/08/06.

2. METHODOLOGY
In experiment design, the distance-based space-filling sampling
methods are popular, especially, when we believe that interesting
features of the true model are just as likely to be in one part of the
experimental region as another. Among them, the maximin dis-
tance design is commonly used. However, since some of the ar-
chitectural design parameters are nominal (no intrinsic ordering
structure) and the others are discrete (having a small number of
values), we use the following defined distance before applying the
maximin distance criterion. Let be the weight for the j
th
design parameter.
j
wt
()
(
[]
=
×=
p
j
jjj
xxIwtd
1
2121
, xx
)
where
(
)
parameterdesign for levels ofnumber log
2
th
j
jwt =
and
(
)
AI
is an indicator function, equal to one when A holds, other-
wise zero. Note that the weight for each design parameter is equal
to its information entropy with uniform probability for each of its
possible values.
In our method, a small number of initial design points are selected
based on the Maximin distance criterion (maximize the shortest
distance among selected points). The processor performance is
measured via benchmark simulations on the selected design
points. Then, MART is applied 20 times on the sampled points
with random perturbation. The reason to use MART, an ensemble
of trees, is the following: (1) trees are inherently nonparametric
and can handle mixed-type of input variables naturally, i.e. no
assumptions are made regarding the underlying distribution of
values of the input variables, as well as categorical predictors with
either ordinal or non-ordinal structure; (2) trees are adept at cap-
turing non-additive behavior, i.e. complex interactions among
predictors are routinely and automatically handled with relatively
little input required from the analyst; (3) MART improves the
prediction performance from a single tree by using an ensemble
of trees.
Adaptive sampling, also known as active learning in machine
learning literature, involves sequential sampling schemes that use
information gleaned from previous observations to guide the sam-
pling process. Studies have shown that adaptively selecting sam-
ples in order to learn a target function can outperform conven-
tional sampling schemes. In our method, for each of the MART-
fitted model, it predicts the rest of the points in the design space.
Sort these points according to the coefficient of variance (CoV,
the ratio of standard deviation to mean) for the model prediction.
Selected the points with maximal CoV (under minimal pairwise-
distance constrain) and measure their performance. Repeat the
underlined adaptive sampling process above until some stopping
criterion is met (e.g. time limit and user pre-specified number of
iterations).
3. EXPERIMENTAL RESULTS
We modified sim-outorder, the out-of-order pipelined simulator in
SimpleScalar, to be an eight-stage Alpha-21264 like pipeline.
Twelve (eight integer and four floating point) CPU and memory
intensive programs from SPEC2000 were selected. To show the
typical behavior, we skipped a number of instructions for each
SPEC program based on a previous work
[4]. Then we collected
the number of execution cycles for the next 100 million instruc-
tions. The total design space for each workload is about 15 mil-
lion configurations composed of the cross product of 13 design
parameter choices. For each workload, 500 initial design points
are sampled based on the maximin distance criterion described in
Section 2. Then another 500 points are sampled according to the
adaptive sampling scheme described in Section 2. Repeat the
sampling process until 3000 design points were sampled for each
benchmark. Notice that for 3000 points, we only explored ap-
proximately 0.02% of the total 15 million points in the design
space. An independent test set which consists of 5000 points is
used to evaluate the prediction performance of fitted models. The
following table shows the average percentage errors (PE) on
twelve benchmarks with roughly 0.0067%, 0.0133% and 0.02%
space sampled. The mean PE ranges from 0.41% to 4.18% for the
12 benchmarks. For the worst-case performance, the percentage
errors are within 7% for 10 out of 12 benchmarks. The results
indicate that our model achieves highly accurate prediction and
robustness under the worst-case situation.
Table 1: Summary of performance prediction error with specified
error and percentage of full space sampled. Max PE is the maxi-
mum percentage error among the 5000 test points.
4. REFERENCES
[1] J. Friedman, “Greedy function approximation: a gradient
boosting machine,” The Annals of Statistics, 29: 1189-1232,
2001.
[2] E. İpek, S.A. McKee, B.R. Supinski, “M. Schulz and R.
Caru-ana. Efficiently exploring architectural design spaces
via predictive modeling,” ASPLOS XII, Oct. 2006.
[3] B. Lee and D. Brooks, “Accurate and efficient regression
modeling for microarchitectural performance and power pre-
diction,” ASPLOS XII, Oct. 2006.
[4] S. Sair, M. Charney, "Memory Behavior of the SPEC2000
Benchmark Suit," Tech. Report, IBM Corp. Oct. 2000.
0.0067% 0.0133% 0.020%
Bench
mark
Mean
(%)
Max
(%)
Mean
(%)
Max
(%)
Mean
(%)
Max
(%)
art
6.299 42.79 4.633 24.95 4.179 22.55
bzip
0.734 4.496 0.460 3.328 0.406 3.165
crafty
1.623 13.10 1.018 7.171 0.865 5.529
equake
2.654 18.69 2.260 15.77 2.130 15.04
fma3d
0.912 5.426 0.704 3.362 0.625 2.964
gcc
0.740 4.044 0.491 3.024 0.426 2.256
mcf
0.668 4.988 0.501 4.217 0.456 4.236
parser
0.831 4.905 0.515 3.649 0.420 2.305
swim
1.442 9.588 0.905 5.937 0.659 4.627
twolf
1.826 10.35 1.380 7.533 1.227 6.315
vortex
1.359 13.07 0.925 7.112 0.800 6.885
vpr
0.983 6.929 0.616 4.533 0.529 4.323
Citations
More filters

Journal ArticleDOI
Bin Li1, Lu Peng1, Balachandran Ramadass1Institutions (1)
TL;DR: A performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining is proposed which generates highly accurate estimations for unsampled points in the design space and shows the robustness for the worst-case prediction.
Abstract: Computer architects usually evaluate new designs using cycle-accurate processor simulation. This approach provides a detailed insight into processor performance, power consumption and complexity. However, only configurations in a subspace can be simulated in practice due to long simulation time and limited resource, leading to suboptimal conclusions which might not be applied to a larger design space. In this paper, we propose a performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining. According to our experiments on single and multi-core processors, our prediction model generates highly accurate estimations for unsampled points in the design space and show the robustness for the worst-case prediction. Moreover, the model provides quantitative interpretation tools that help investigators to efficiently tune design parameters and remove performance bottlenecks.

36 citations


Additional excerpts

  • ...[13] by more results in Sections 3–5....

    [...]


Journal ArticleDOI
Qi Guo1, Tianshi Chen1, Yunji Chen1, Ling Li1  +1 moreInstitutions (1)
TL;DR: The key of this approach is utilizing inherent program characteristics as prior knowledge (in addition to microarchitectural configurations) to build a universal predictive model, so that no additional simulation is required for evaluating new programs on new configurations.
Abstract: Predictive modeling is an emerging methodology for microarchitectural design space exploration. However, this method suffers from high costs to construct predictive models, especially when unseen programs are employed in performance evaluation. In this paper, we propose a fast predictive model-based approach for microarchitectural design space exploration. The key of our approach is utilizing inherent program characteristics as prior knowledge (in addition to microarchitectural configurations) to build a universal predictive model. Thus, no additional simulation is required for evaluating new programs on new configurations. Besides, due to employed model tree technique, we can provide insights of the design space for early design decisions. Experimental results demonstrate that our approach is comparable to previous approaches regarding their prediction accuracies of performance/energy. Meanwhile, the training time of our approach achieves 7.6-11.8x speedup over previous approaches for each workload. Moreover, the training costs of our approach can be further reduced via instrumentation technique.

23 citations


Journal ArticleDOI
Qingzhao Yu1, Bin Li1, Zhide Fang1, Lu Peng1Institutions (1)
Abstract: The evaluation of new processor designs is an important issue in electrical and computer engineering. Architects use simulations to evaluate designs and to understand trade-offs and interactions among design parameters. However, due to the lengthy simulation time and limited resources, it is often practically impossible to simulate a full factorial design space. Effective sampling methods and predictive models are required. In this paper, the authors propose an automated performance predictive approach which employs an adaptive sampling scheme that interactively works with the predictive model to select samples for simulation. These samples are then used to build Bayesian additive regression trees, which in turn are used to predict the whole design space. Both real data analysis and simulation studies show that the method is effective in that, though sampling at very few design points, it generates highly accurate predictions on the unsampled points. Furthermore, the proposed model provides quantitative interpretation tools with which investigators can efficiently tune design parameters in order to improve processor performance. The Canadian Journal of Statistics 38: 136–152; 2010 © 2010 Statistical Society of Canada L'evaluation de la conception de nouveaux processeurs est une etape importante en genie electrique et informatique. Les architectes utilisent des simulations afin d'evaluer les concepts et de comprendre les compromis et les interactions entre les differents parametres du modele de conception. Cependant, a cause de temps de simulation excessif et de la limitation des ressources, il est pratiquement impossible de simuler un devis factoriel complet. Des methodes d'echantillonnage efficaces et des modeles de prediction sont requis. Dans cet article, les auteurs proposent une approche automatique pour predire la performance qui utilise un plan d'echantillonnage adaptatif interagissant avec le modele predictif pour choisir les echantillons lors de la simulation. Ces echantillons sont alors utilises pour construire des arbres de regression bayesiens additifs qui sont a leur tour utilises pour predire l'ensemble de l'espace des devis. Des analyses de vraies donnees et des etudes de simulation ont montre que cette methode est efficace. En effet, meme si l'echantillonnage est fait sur tres peu de points de devis, il genere des predictions tres precises sur les points non echantillonnes. De plus, le modele propose fournit des outils d'interpretation quantitatifs permettant aux chercheurs d'ajuster precisement les parametres du devis afin d'ameliorer les performances du processeur. La revue canadienne de statistique 38: 136–152; 2010 © 2010 Societe statistique du Canada

6 citations


Journal ArticleDOI
Qingzhao Yu1, Bin Li2, Zhide Fang1, Lu Peng2Institutions (2)
TL;DR: An adaptive sampling scheme that interactively works with predictive models to sequentially select design points for computer experiments and gives more accurate predictions on the unsampled points than the models built on samples from other methods such as random sampling, space‐filling designs and some adaptive sampling methods.
Abstract: Computer experiments have become increasingly important in several different industries. These experiments save resources by exploring different designs without necessitating real hardware manufacturing. However, computer experiments usually require lengthy simulation times and powerful computational capacity. Therefore, it is often pragmatically impossible to run experiments on a complete design space. In this paper, we propose an adaptive sampling scheme that interactively works with predictive models to sequentially select design points for computer experiments. The selected samples are used to build predictive models, which in turn guide further sampling and predict the entire design space. For illustration, we use Bayesian additive regression trees (BART), multiple additive regression trees (MART), treed Gaussian process and Gaussian process to guide the proposed sampling method. Both real data and simulation studies show that our sampling method is effective in that (i) it can be used with different predictive models; (ii) it can select multiple design points without repeatedly refitting the predictive models, which makes parallel simulations possible and (iii) the predictive model built on its generated samples gives more accurate predictions on the unsampled points than the models built on samples from other methods such as random sampling, space-filling designs and some adaptive sampling methods. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.

3 citations


01 Jan 2009-

1 citations


Cites background or methods from "Efficient mart-aided modeling for m..."

  • ...Finally I am thankful to my parents and family for their endless love and blessings, especially my mother Saraswathy Ramadass for her enormous support and devotion....

    [...]

  • ...Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2009 Processor design space exploration and performance prediction Balachandran Ramadass Louisiana State University and Agricultural and Mechanical College, bramad2@lsu.edu Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_theses Part of the Electrical and Computer Engineering Commons This Thesis is brought to you for free and open access by the Graduate School at LSU Digital Commons....

    [...]

  • ...[17] explored the multi-dimensional design space across a range of possible chip sizes...

    [...]

  • ...Li, Peng and Ramadass[17] approached the microarchitecture design space exploration and performance prediction problem using MART Model....

    [...]

  • ...1030. https://digitalcommons.lsu.edu/gradschool_theses/1030 PROCESSOR DESIGN SPACE EXPLORATION AND PERFORMANCE PREDICTION A Thesis Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering in The Department of Electrical and Computer Engineering by Balachandran Ramadass B.Tech, Pondicherry University, Pondicherry, India, 2005 August, 2009 ii...

    [...]


References
More filters

Journal ArticleDOI
Jerome H. Friedman1Institutions (1)
TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Abstract: Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

12,602 citations


"Efficient mart-aided modeling for m..." refers background or methods in this paper

  • ...The proposed method includes the following four components: (1) the maximin spacefilling sampling method that selects the initial design representatives from among a large amount of design alternatives; (2) the state-of-the-art predictive modeling method Multiple Additive Regression Trees (MART) [1] that builds a nonparametric model with exceptional accuracy while remaining remarkably robust; (3) an active learning method that selects the most informative design points needed to improve the prediction accuracy sequentially; (4) interpretation tools for MART-fitted models that shows the importance and partial dependence of design parameters....

    [...]

  • ...The reason to use MART, an ensemble of trees, is the following: (1) trees are inherently nonparametric and can handle mixed-type of input variables naturally, i....

    [...]


Proceedings ArticleDOI
Benjamin C. Lee1, David Brooks1Institutions (1)
20 Oct 2006-
TL;DR: This paper derives and validate regression models for performance and power, and presents optimizations for a baseline regression model to obtain application-specific models to maximize accuracy in performance prediction and regional power models leveraging only the most relevant samples from the microarchitectural design space to maximizing accuracy in power prediction.
Abstract: We propose regression modeling as an efficient approach for accurately predicting performance and power for various applications executing on any microprocessor configuration in a large microarchitectural design space. This paper addresses fundamental challenges in microarchitectural simulation cost by reducing the number of required simulations and using simulated results more effectively via statistical modeling and inference.Specifically, we derive and validate regression models for performance and power. Such models enable computationally efficient statistical inference, requiring the simulation of only 1 in 5 million points of a joint microarchitecture-application design space while achieving median error rates as low as 4.1 percent for performance and 4.3 percent for power. Although both models achieve similar accuracy, the sources of accuracy are strikingly different. We present optimizations for a baseline regression model to obtain (1) application-specific models to maximize accuracy in performance prediction and (2) regional power models leveraging only the most relevant samples from the microarchitectural design space to maximize accuracy in power prediction. Assessing sensitivity to the number of samples simulated for model formulation, we find fewer than 4,000 samples from a design space of approximately 22 billion points are sufficient. Collectively, our results suggest significant potential in accurate and efficient statistical inference for microarchitectural design space exploration via regression models.

446 citations


Proceedings ArticleDOI
Engin Ipek1, Sally A. McKee1, Rich Caruana1, Bronis R. de Supinski2  +1 moreInstitutions (2)
20 Oct 2006-
TL;DR: This work builds accurate, confident predictive design-space models that produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions.
Abstract: Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated approach that builds accurate, confident predictive design-space models. We simulate sampled points, using the results to teach our models the function describing relationships among design parameters. The models produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions. We validate our approach via sensitivity studies on memory hierarchy and CPU design spaces: our models generally predict IPC with only 1-2% error and reduce required simulation by two orders of magnitude. We also show the efficacy of our technique for exploring chip multiprocessor (CMP) design spaces: when trained on a 1% sample drawn from a CMP design space with 250K points and up to 55x performance swings among different system configurations, our models predict performance with only 4-5% error on average. Our approach combines with techniques to reduce time per simulation, achieving net time savings of three-four orders of magnitude.

342 citations


"Efficient mart-aided modeling for m..." refers background or methods in this paper

  • ...The proposed method includes the following four components: (1) the maximin spacefilling sampling method that selects the initial design representatives from among a large amount of design alternatives; (2) the state-of-the-art predictive modeling method Multiple Additive Regression Trees (MART) [1] that builds a nonparametric model with exceptional accuracy while remaining remarkably robust; (3) an active learning method that selects the most informative design points needed to improve the prediction accuracy sequentially; (4) interpretation tools for MART-fitted models that shows the importance and partial dependence of design parameters....

    [...]

  • ...no assumptions are made regarding the underlying distribution of values of the input variables, as well as categorical predictors with either ordinal or non-ordinal structure; (2) trees are adept at capturing non-additive behavior, i....

    [...]


Network Information
Related Papers (5)
20 Oct 2006

Engin Ipek, Sally A. McKee +3 more

08 Feb 2003

Joshua J. Yi, David J. Lilja +1 more

03 Sep 2012

Dennis Westermann, Jens Happe +2 more

21 Jul 2015

Siva Krishna Dasari, Niklas Lavesson +2 more

11 Nov 2013

Jianmei Guo, Krzysztof Czarnecki +3 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20131
20121
20101
20092