scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Automated inference of goal-oriented performance prediction functions

TL;DR: This paper presents different strategies for automated improvement of the prediction model through an adaptive selection of new measurement points based on the accuracy of the predictions, and proposes an automated, measurement-based model inference method to derive goal-oriented performance prediction functions.
Abstract: Understanding the dependency between performance metrics (such as response time) and software configuration or usage parameters is crucial in improving software quality. However, the size of most modern systems makes it nearly impossible to provide a complete performance model. Hence, we focus on scenario-specific problems where software engineers require practical and efficient approaches to draw conclusions, and we propose an automated, measurement-based model inference method to derive goal-oriented performance prediction functions. For the practicability of the approach it is essential to derive functional dependencies with the least possible amount of data. In this paper, we present different strategies for automated improvement of the prediction model through an adaptive selection of new measurement points based on the accuracy of the prediction model. In order to derive the prediction models, we apply and compare different statistical methods. Finally, we evaluate the different combinations based on case studies using SAP and SPEC benchmarks.
Citations
More filters
Proceedings ArticleDOI
11 Nov 2013
TL;DR: A variability-aware approach to performance prediction via statistical learning that works progressively with random samples, without additional effort to detect feature interactions is proposed.
Abstract: Configurable software systems allow stakeholders to derive program variants by selecting features. Understanding the correlation between feature selections and performance is important for stakeholders to be able to derive a program variant that meets their requirements. A major challenge in practice is to accurately predict performance based on a small sample of measured variants, especially when features interact. We propose a variability-aware approach to performance prediction via statistical learning. The approach works progressively with random samples, without additional effort to detect feature interactions. Empirical results on six real-world case studies demonstrate an average of 94% prediction accuracy based on small random samples. Furthermore, we investigate why the approach works by a comparative analysis of performance distributions. Finally, we compare our approach to an existing technique and guide users to choose one or the other in practice.

180 citations


Cites background from "Automated inference of goal-oriente..."

  • ...A major challenge in practice is to accurately predict performance based on a small sample of measured variants, especially when features interact....

    [...]

  • ...Quantifying the performance influence of each individual feature is not sufficient in most cases, as feature interactions may cause unpredictable performance anomalies [19]....

    [...]

Book Chapter
16 Aug 2004
TL;DR: In this article, the authors propose to derive application-specific test cases from architecture designs so that performance of a distributed application can be tested using the middleware software at early stages of a development process.
Abstract: Performance characteristics, such as response time, throughput and scalability, are key quality attributes of distributed applications. Current practice, however, rarely applies systematic techniques to evaluate performance characteristics. We argue that evaluation of performance is particularly crucial in early development stages, when important architectural choices are made. At first glance, this contradicts the use of testing techniques, which are usually applied towards the end of a project. In this paper, we assume that many distributed systems are built with middleware technologies, such as the Java 2 Enterprise Edition (J2EE) or the Common Object Request Broker Architecture (CORBA). These provide services and facilities whose implementations are available when architectures are defined. We also note that it is the middleware functionality, such as transaction and persistence services, remote communication primitives and threading policy primitives, that dominate distributed system performance. Drawing on these observations, this paper presents a novel approach to performance testing of distributed applications. We propose to derive application-specific test cases from architecture designs so that performance of a distributed application can be tested using the middleware software at early stages of a development process. We report empirical results that support the viability of the approach.

137 citations

Proceedings ArticleDOI
09 Nov 2015
TL;DR: This paper adapt two widely-used sampling strategies for performance prediction to the domain of configurable systems and evaluate them in terms of sampling cost, which considers prediction accuracy and measurement effort simultaneously.
Abstract: A key challenge of the development and maintenance of configurable systems is to predict the performance of individual system variants based on the features selected. It is usually infeasible to measure the performance of all possible variants, due to feature combinatorics. Previous approaches predict performance based on small samples of measured variants, but it is still open how to dynamically determine an ideal sample that balances prediction accuracy and measurement effort. In this paper, we adapt two widely-used sampling strategies for performance prediction to the domain of configurable systems and evaluate them in terms of sampling cost, which considers prediction accuracy and measurement effort simultaneously. To generate an initial sample, we introduce a new heuristic based on feature frequencies and compare it to a traditional method based on t-way feature coverage. We conduct experiments on six real-world systems and provide guidelines for stakeholders to predict performance by sampling.

130 citations


Cites background or methods from "Automated inference of goal-oriente..."

  • ...[19] analyzed various statistical inference techniques to predict the performance of configurable systems....

    [...]

  • ...In previous work, prediction accuracy was the main evaluation metric used to estimate the utility of the prediction models [5], [17], [19]....

    [...]

Journal ArticleDOI
TL;DR: A data-efficient learning approach that combines several techniques of machine learning and statistics for performance prediction of configurable systems is proposed, called DECART, and a sample quality metric is proposed and introduced to introduce a quantitative analysis of the quality of a sample forperformance prediction.
Abstract: Many software systems today are configurable, offering customization of functionality by feature selection. Understanding how performance varies in terms of feature selection is key for selecting appropriate configurations that meet a set of given requirements. Due to a huge configuration space and the possibly high cost of performance measurement, it is usually not feasible to explore the entire configuration space of a configurable system exhaustively. It is thus a major challenge to accurately predict performance based on a small sample of measured system variants. To address this challenge, we propose a data-efficient learning approach, called DECART, that combines several techniques of machine learning and statistics for performance prediction of configurable systems. DECART builds, validates, and determines a prediction model based on an available sample of measured system variants. Empirical results on 10 real-world configurable systems demonstrate the effectiveness and practicality of DECART. In particular, DECART achieves a prediction accuracy of 90% or higher based on a small sample, whose size is linear in the number of features. In addition, we propose a sample quality metric and introduce a quantitative analysis of the quality of a sample for performance prediction.

88 citations


Cites background or methods from "Automated inference of goal-oriente..."

  • ...Typically, only a limited set of configurations can be measured in practice, either by simulation [45] or by monitoring in the field [42]....

    [...]

  • ...[45] presented an approach for the automated improvement of performanceprediction functions by three measurement-point-selection strategies based on the prediction accuracy....

    [...]

  • ..., response time or throughput) is one of the most important non-functional properties, because it directly affects user perception and cost [45]....

    [...]

Proceedings ArticleDOI
09 Nov 2015
TL;DR: A novel algorithm based on Fourier transform that is able to make predictions of any configurable software system with theoretical guarantees of accuracy and confidence level specified by the user, while using minimum number of samples up to a constant factor is proposed.
Abstract: Understanding how performance varies across a large number of variants of a configurable software system is important for helping stakeholders to choose a desirable variant. Given a software system with n optional features, measuring all its 2^n possible configurations to determine their performances is usually infeasible. Thus, various techniques have been proposed to predict software performances based on a small sample of measured configurations. We propose a novel algorithm based on Fourier transform that is able to make predictions of any configurable software system with theoretical guarantees of accuracy and confidence level specified by the user, while using minimum number of samples up to a constant factor. Empirical results on the case studies constructed from real-world configurable systems demonstrate the effectiveness of our algorithm.

75 citations


Additional excerpts

  • ...Take execution time as an example: since it is simply infeasible to run all 2 configurations for a system with n features, the key challenge is to accurately predict the performance of the system on all configurations by measuring only a small number of sample configurations, as is being actively studied in many recent works [6], [13], [14], [15], [16], [17], [20], [4]....

    [...]

References
More filters
Book
28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

19,261 citations

Book
John R. Koza1
01 Jan 1992
TL;DR: This book discusses the evolution of architecture, primitive functions, terminals, sufficiency, and closure, and the role of representation and the lens effect in genetic programming.
Abstract: Background on genetic algorithms, LISP, and genetic programming hierarchical problem-solving introduction to automatically-defined functions - the two-boxes problem problems that straddle the breakeven point for computational effort Boolean parity functions determining the architecture of the program the lawnmower problem the bumblebee problem the increasing benefits of ADFs as problems are scaled up finding an impulse response function artificial ant on the San Mateo trail obstacle-avoiding robot the minesweeper problem automatic discovery of detectors for letter recognition flushes and four-of-a-kinds in a pinochle deck introduction to biochemistry and molecular biology prediction of transmembrane domains in proteins prediction of omega loops in proteins lookahead version of the transmembrane problem evolutionary selection of the architecture of the program evolution of primitives and sufficiency evolutionary selection of terminals evolution of closure simultaneous evolution of architecture, primitive functions, terminals, sufficiency, and closure the role of representation and the lens effect Appendices: list of special symbols list of special functions list of type fonts default parameters computer implementation annotated bibliography of genetic programming electronic mailing list and public repository

13,487 citations

Journal ArticleDOI
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Abstract: (2004). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Journal of the American Statistical Association: Vol. 99, No. 466, pp. 567-567.

10,549 citations

Journal ArticleDOI
TL;DR: In this article, a new method is presented for flexible regression modeling of high dimensional data, which takes the form of an expansion in product spline basis functions, where the number of basis functions as well as the parameters associated with each one (product degree and knot locations) are automatically determined by the data.
Abstract: A new method is presented for flexible regression modeling of high dimensional data. The model takes the form of an expansion in product spline basis functions, where the number of basis functions as well as the parameters associated with each one (product degree and knot locations) are automatically determined by the data. This procedure is motivated by the recursive partitioning approach to regression and shares its attractive properties. Unlike recursive partitioning, however, this method produces continuous models with continuous derivatives. It has more power and flexibility to model relationships that are nearly additive or involve interactions in at most a few variables. In addition, the model can be represented in a form that separately identifies the additive contributions and those associated with the different multivariable interactions.

6,651 citations

Proceedings Article
01 Jan 1990
TL;DR: The authors' goal is always to offer you an assortment of cost-free ebooks too as aid resolve your troubles.

2,593 citations


"Automated inference of goal-oriente..." refers background in this paper

  • ...The approach consists of five central building blocks (based on [10]): 1....

    [...]

  • ...Our overall approach aims at reducing this manual effort as much as possible and consists of five major steps (based on [10]): (i) Defining the context and the goal of the performance evaluation (ii) specifying potential performance influencing parameters and setting up a measurement environment…...

    [...]