scispace - formally typeset
Proceedings ArticleDOI

Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Reads0
Chats0
TLDR
QoS-Compile is presented, the first compilation approach that statically manipulates application contentiousness to enable the co-location of applications with varying QoS requirements, and as a result, can greatly improve machine utilization.
Abstract
As the class of datacenters recently coined as warehouse scale computers (WSCs) continues to leverage commodity multicore processors with increasing core counts, there is a growing need to consolidate various workloads on these machines to fully utilize their computation power. However, it is well known that when multiple applications are co-located on a multicore machine, contention for shared memory resources can cause severe cross-core performance interference. To ensure that the quality of service (QoS) of user-facing applications does not suffer from performance interference, WSC operators resort to disallowing co-location of latency-sensitive applications with other applications. This policy translates to low machine utilization and millions of dollars wasted in WSCs.This paper presents QoS-Compile, the first compilation approach that statically manipulates application contentiousness to enable the co-location of applications with varying QoS requirements, and as a result, can greatly improve machine utilization. Our technique first pinpoints an application's code regions that tend to cause contention and performance interference. QoS-Compile then transforms those regions to reduce their contentious nature. In essence, to co-locate applications of different QoS priorities, our compilation technique uses pessimizing transformations to throttle down the memory access rate of the contentious regions in low priority applications to reduce their interference to high priority applications. Our evaluation using synthetic benchmarks, SPEC benchmarks and large-scale Google applications show that QoS-Compile can greatly reduce contention, improve QoS of applications, and improve machine utilization. Our experiments show that our technique improves applications' QoS performance by 21% and machine utilization by 36% on average.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers

TL;DR: B Bubble-Flux is presented, an integrated dynamic interference measurement and online QoS management mechanism to provide accurate QoS control and maximize server utilization.
Proceedings ArticleDOI

Whare-map: heterogeneity in "homogeneous" warehouse-scale computers

TL;DR: This paper exposes and quantify the performance impact of the "homogeneity assumption" for modern production WSCs using industry-strength large-scale web-service workloads, and proposes "Whare-Map," the WSC Heterogeneity Aware Mapper that leverages already in-place continuous profiling subsystems found in production environments.
Proceedings ArticleDOI

SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers

TL;DR: This paper demonstrates through a real- system investigation that the fundamental difference between resource sharing behaviors on CMP and SMT architectures calls for a redesign of the way the authors model interference, and proposes SMiTe, a methodology that enables precise performance prediction for SMT co-location on real-system commodity processors.
Proceedings ArticleDOI

Profile-guided automated software diversity

TL;DR: This work investigates the impact of profiling on an expensive diversification technique: NOP insertion, and finds that by differentiating between hot cold and cold code, even randomization techniques having a high performance overhead become practical.
Journal ArticleDOI

Machine Learning in Compiler Optimization

TL;DR: In the last decade, machine-learning-based compilation has moved from an obscure research niche to a mainstream activity as discussed by the authors, and the main concepts of features, models, training, and deployment have been introduced.
References
More filters
Journal ArticleDOI

Pin: building customized program analysis tools with dynamic instrumentation

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Book

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

TL;DR: The architecture of WSCs is described, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base are described.
Journal ArticleDOI

Web search for a planet: The Google cluster architecture

TL;DR: Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software that achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers.
Proceedings ArticleDOI

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

TL;DR: In this article, the authors propose a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources.
Proceedings ArticleDOI

PowerNap: eliminating server idle power

TL;DR: The PowerNap concept, an energy-conservation approach where the entire system transitions rapidly between a high-performance active state and a near-zero-power idle state in response to instantaneous load, is proposed and the Redundant Array for Inexpensive Load Sharing (RAILS) is introduced.
Related Papers (5)