scispace - formally typeset
Proceedings ArticleDOI

Matchmaking: distributed resource management for high throughput computing

Reads0
Chats0
TLDR
The classified advertisement (classad) matchmaking framework is developed and implemented, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources.
Abstract
Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high throughput computing. Obstacles include heterogeneity of resources, which make uniform allocation algorithms difficult to formulate, and distributed ownership, leading to widely varying allocation policies. Faced with these problems, we developed and implemented the classified advertisement (classad) matchmaking framework, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources. Novel aspects of the framework include a semi structured data model that combines schema, data, and query in a simple but powerful specification language, and a clean separation of the matching and claiming phases of resource allocation. The representation and protocols result in a robust, scalable and flexible framework that can evolve with changing resources. The framework was designed to solve real problems encountered in the deployment of Condor, a high throughput computing system developed at the University of Wisconsin-Madison. Condor is heavily used by scientists at numerous sites around the world. It derives much of its robustness and efficiency from the matchmaking architecture.

read more

Citations
More filters
Journal ArticleDOI

Distributed computing in practice: the Condor experience

TL;DR: The history and philosophy of the Condor project is provided and how it has interacted with other projects and evolved along with the field of distributed computing is described.
Proceedings ArticleDOI

Grid information services for distributed resource sharing

TL;DR: This work presents an information services architecture that addresses performance, security, scalability, and robustness requirements of Grid software infrastructure and has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has be widely deployed and applied.
Proceedings ArticleDOI

Large-scale cluster management at Google with Borg

TL;DR: A summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it are presented.
Journal ArticleDOI

A taxonomy and survey of grid resource management systems for distributed computing

TL;DR: In this article, an abstract model and a comprehensive taxonomy for describing resource management architectures is developed, which is used to identify approaches followed in the implementation of existing resource management systems for very large-scale network computing systems known as Grids.
Proceedings ArticleDOI

Quincy: fair scheduling for distributed computing clusters

TL;DR: It is argued that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures.
References
More filters
Journal ArticleDOI

Globus: a Metacomputing Infrastructure Toolkit

TL;DR: The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and net work, an integrated set of higher level services that enable applications to adapt to heteroge neous and dynamically changing metacomputing environ ments.
Proceedings ArticleDOI

Condor-a hunter of idle workstations

TL;DR: The design, implementation, and performance of the Condor scheduling system, which operates in a workstation environment, are presented and a performance profile of the system is presented that is based on data accumulated from 23 stations during one month.
Proceedings Article

End-to-End Arguments in System Design.

TL;DR: A design principle is presented that helps guide placement of functions among the modules of a distributed computer system and suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level.
Journal ArticleDOI

End-to-end arguments in system design

TL;DR: The end-to-end argument as discussed by the authors suggests that functions placed at low levels of a distributed computer system may be redundant or of little value when compared with the cost of providing them at that low level.
Proceedings Article

A resource management architecture for metacomputing systems.

TL;DR: This work describes a resource management architecture that distributes the resource management problem among distinct local manager, resource broker, and resource co-allocator components and defines an extensible resource specification language to exchange information about requirements.