scispace - formally typeset
Journal ArticleDOI

Improving utilization of heterogeneous clusters

Esteban Stafford, +1 more
- 27 Jan 2020 - 
- Vol. 76, Iss: 11, pp 8787-8800
Reads0
Chats0
TLDR
This article presents a novel way of expressing computational capacity, more adequate for heterogeneous clusters, and also advocates for task migration in order to further improve the utilization.
Abstract
This work has been supported by the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and TIN2016-81840-REDT (CAPAP-H6 network) and the European HiPEAC Network of Excellence

read more

Citations
More filters
Journal ArticleDOI

Performance and energy task migration model for heterogeneous clusters

TL;DR: A set of linear regression models to predict the impact of task migration on different objectives, like performance and energy consumption, using a small set of parameters that are easily measurable are presented.
Proceedings ArticleDOI

A Simulator for Intelligent Workload Managers in Heterogeneous Clusters

TL;DR: In this paper, a simulation framework for the study of workload managers based on deep reinforcement learning (DRL) techniques is proposed to simulate heterogeneous clusters based on multicore architectures, taking into account the contention in shared memory access and energy consumption.
Book ChapterDOI

Task Scheduler for Heterogeneous Data Centres Based on Deep Reinforcement Learning

TL;DR: In this paper , the authors leverage machine learning techniques to develop a workload manager that will improve the efficiency of modern data centres, which can not only choose the most adequate job for scheduling, but also determine the best compute resources for its execution.
References
More filters
Journal ArticleDOI

Exploiting process lifetime distributions for dynamic load balancing

TL;DR: The measurements indicate that the distribution of lifetimes for a UNIX process is Pareto (heavy-tailed), with a consistent functional form over a variety of workloads, and it is shown how to apply this distribution to derive a preemptive migration policy that requires no hand-tuned parameters.
Proceedings ArticleDOI

DMTCP: Transparent checkpointing for cluster computations and the desktop

TL;DR: DMTCP as mentioned in this paper is a transparent user-level checkpointing package for distributed applications, which is used for the runCMS experiment of the Large Hadron Collider at CERN, and it can be incorporated and distributed as a checkpoint-restart module within some larger package.
Proceedings ArticleDOI

Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning

TL;DR: ACES's energy savings are close to the optimal and it delivers power proportionality while balancing the tradeoff between energy savings and reliability costs.
Journal ArticleDOI

A Survey of Task Allocation and Load Balancing in Distributed Systems

TL;DR: This survey mainly categorizes and reviews the representative studies on task allocation and load balancing according to the general characteristics of varying distributed systems and makes a comprehensive taxonomy on them.
Proceedings ArticleDOI

Lifetime or energy: Consolidating servers with reliability control in virtualized cloud datacenters

TL;DR: This paper proposes a Reliability-Aware server Consolidation stratEgy, named RACE, to address when and how to perform energy-efficient server consolidation in a reliability-friendly and profitable way, and develops an utility model that unifies multiple constraints on performance SLAs, reliability factors, and energy costs in a holistic manner.