Speedup your analytics: automatic parameter tuning for databases and big data systems
read more
Citations
Resource provisioning framework for mapreduce jobs with performance goals
A Survey on Automatic Parameter Tuning for Big Data Processing Systems
AI Meets Database: AI4DB and DB4AI
Investigating Approaches of Integrating BIM, IoT, and Facility Management for Renovating Existing Buildings: A Review
Database Meets Artificial Intelligence: A Survey
References
Apache Hadoop YARN: yet another resource negotiator
A comparison of approaches to large-scale data analysis
Starfish: A Self-tuning System for Big Data Analytics.
ARIA: automatic resource inference and allocation for mapreduce environments
Related Papers (5)
Frequently Asked Questions (13)
Q2. What are the main aspects of the configuration parameter tuning?
ranging from low-level memory settings and thread counts to higher-level decisions like scheduling and resource management.
Q3. What are the performance benefits of tuning?
The performance benefits of tuning are well-known in the industry, sometimes measured in orders of magnitude of improvement [24], while bad configurations (or misconfiguration) can lead to significantly degraded performance [27].
Q4. What are the main themes of the article?
The continuous growth of the World Wide Web, Internet of Things (IoT), E-commerce, and other applications are generating massive amounts of ever-increasing raw data every day.
Q5. What is the main theme of the paper?
The use of automated configuration parameter tuning techniques is a promising, yet challenging, approach to optimizing system performance.
Q6. What are the main topics of the tutorial?
In the last part of the tutorial, the authors focus on open challenges that must be addressed to ensure the success of automatic parameter tuning, especially when taking into account the growth of scale and complexity of big data analytics systems.
Q7. What is the motivation for this tutorial?
The authors motivate the need for automatic parameter tuning with several applications/scenarios in the era of Big Data and cloud computing.
Q8. What are the main challenges of the VLDB?
The major challenges include three aspects as follows: (i) Large and complex parameter space: Database systems often have hundreds of tuning knobs [24], while Hadoop and Spark have around 200 configurable parameters each [15].
Q9. How many highly-cited approaches have been identified?
the authors have identified over 40 highly-cited approaches (e.g., [13,15]) spanning their six categories and published within the last 10 years.
Q10. What is the purpose of this tutorial?
This tutorial is intended for a wide scope of audience ranging from academic researchers to industrial data scientists that want to understand the impact of parameters on performance in big data analytics systems.
Q11. What are the main learning outcomes of this tutorial?
(2) An overview of tuning approaches used by the current database and big data platforms including rule-based, cost modeling, simulationbased, experiment-driven, machine learning, and adaptive approaches.
Q12. What are the main reasons why a system can be optimized?
Improper settings of configuration parameters are shown to have detrimental effects on the overall system performance and stability [9, 13].
Q13. What are the main characteristics of the parameters that are used in Hadoop?
This tutorial will perform a comprehensive study of existing parameter tuning approaches, which tackle various challenges towards high resource utilization, fast response time, and cost-effectiveness.