Matchmaking: distributed resource management for high throughput computing

doi:10.1109/HPDC.1998.709966

Proceedings ArticleDOI

Matchmaking: distributed resource management for high throughput computing

Rajesh Raman, +2 more

- pp 140-146

Chats0

TLDR

The classified advertisement (classad) matchmaking framework is developed and implemented, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources.

Abstract:

Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high throughput computing. Obstacles include heterogeneity of resources, which make uniform allocation algorithms difficult to formulate, and distributed ownership, leading to widely varying allocation policies. Faced with these problems, we developed and implemented the classified advertisement (classad) matchmaking framework, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources. Novel aspects of the framework include a semi structured data model that combines schema, data, and query in a simple but powerful specification language, and a clean separation of the matching and claiming phases of resource allocation. The representation and protocols result in a robust, scalable and flexible framework that can evolve with changing resources. The framework was designed to solve real problems encountered in the deployment of Condor, a high throughput computing system developed at the University of Wisconsin-Madison. Condor is heavily used by scientists at numerous sites around the world. It derives much of its robustness and efficiency from the matchmaking architecture.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Distributed computing in practice: the Condor experience

Douglas Thain, +2 more

- 01 Feb 2005 -

Concurrency and Computation: Practice an...

TL;DR: The history and philosophy of the Condor project is provided and how it has interacted with other projects and evolved along with the field of distributed computing is described.

...read moreread less

Proceedings ArticleDOI

Grid information services for distributed resource sharing

Karl Czajkowski, +3 more

TL;DR: This work presents an information services architecture that addresses performance, security, scalability, and robustness requirements of Grid software infrastructure and has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has be widely deployed and applied.

...read moreread less

Proceedings ArticleDOI

Large-scale cluster management at Google with Borg

Abhishek Verma, +5 more

TL;DR: A summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it are presented.

...read moreread less

Journal ArticleDOI

A taxonomy and survey of grid resource management systems for distributed computing

Klaus Krauter, +2 more

- 01 Feb 2002 -

Software - Practice and Experience

TL;DR: In this article, an abstract model and a comprehensive taxonomy for describing resource management architectures is developed, which is used to identify approaches followed in the implementation of existing resource management systems for very large-scale network computing systems known as Grids.

...read moreread less

Proceedings ArticleDOI

Quincy: fair scheduling for distributed computing clusters

Michael Isard, +5 more

TL;DR: It is argued that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Globus: a Metacomputing Infrastructure Toolkit

Ian Foster, +1 more

TL;DR: The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and net work, an integrated set of higher level services that enable applications to adapt to heteroge neous and dynamically changing metacomputing environ ments.

...read moreread less

Proceedings ArticleDOI

Condor-a hunter of idle workstations

M. Litzkow, +2 more

TL;DR: The design, implementation, and performance of the Condor scheduling system, which operates in a workstation environment, are presented and a performance profile of the system is presented that is based on data accumulated from 23 stations during one month.

...read moreread less

Proceedings Article

End-to-End Arguments in System Design.

Jerome H. Saltzer, +2 more

TL;DR: A design principle is presented that helps guide placement of functions among the modules of a distributed computer system and suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level.

...read moreread less

Journal ArticleDOI

End-to-end arguments in system design

Jerome H. Saltzer, +2 more

- 01 Nov 1984 -

ACM Transactions on Computer Systems

TL;DR: The end-to-end argument as discussed by the authors suggests that functions placed at low levels of a distributed computer system may be redundant or of little value when compared with the cost of providing them at that low level.

...read moreread less

Proceedings Article

A resource management architecture for metacomputing systems.

Krzysztof Czajkowski, +6 more

TL;DR: This work describes a resource management architecture that distributes the resource management problem among distinct local manager, resource broker, and resource co-allocator components and defines an extensible resource specification language to exchange information about requirements.

...read moreread less

Matchmaking: distributed resource management for high throughput computing

Citations

Distributed computing in practice: the Condor experience

Grid information services for distributed resource sharing

Large-scale cluster management at Google with Borg

A taxonomy and survey of grid resource management systems for distributed computing

Quincy: fair scheduling for distributed computing clusters

References

Globus: a Metacomputing Infrastructure Toolkit

Condor-a hunter of idle workstations

End-to-End Arguments in System Design.

End-to-end arguments in system design

A resource management architecture for metacomputing systems.

Related Papers (5)

Condor-a hunter of idle workstations

The Grid 2: Blueprint for a New Computing Infrastructure

The Anatomy of the Grid: Enabling Scalable Virtual Organizations

Globus: a Metacomputing Infrastructure Toolkit

The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration