Proceedings ArticleDOI
Matchmaking: distributed resource management for high throughput computing
Rajesh Raman,Miron Livny,Marvin Solomon +2 more
- pp 140-146
Reads0
Chats0
TLDR
The classified advertisement (classad) matchmaking framework is developed and implemented, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources.Abstract:
Conventional resource management systems use a system model to describe resources and a centralized scheduler to control their allocation. We argue that this paradigm does not adapt well to distributed systems, particularly those built to support high throughput computing. Obstacles include heterogeneity of resources, which make uniform allocation algorithms difficult to formulate, and distributed ownership, leading to widely varying allocation policies. Faced with these problems, we developed and implemented the classified advertisement (classad) matchmaking framework, a flexible and general approach to resource management in distributed environment with decentralized ownership of resources. Novel aspects of the framework include a semi structured data model that combines schema, data, and query in a simple but powerful specification language, and a clean separation of the matching and claiming phases of resource allocation. The representation and protocols result in a robust, scalable and flexible framework that can evolve with changing resources. The framework was designed to solve real problems encountered in the deployment of Condor, a high throughput computing system developed at the University of Wisconsin-Madison. Condor is heavily used by scientists at numerous sites around the world. It derives much of its robustness and efficiency from the matchmaking architecture.read more
Citations
More filters
Journal ArticleDOI
Distributed computing in practice: the Condor experience
TL;DR: The history and philosophy of the Condor project is provided and how it has interacted with other projects and evolved along with the field of distributed computing is described.
Proceedings ArticleDOI
Grid information services for distributed resource sharing
TL;DR: This work presents an information services architecture that addresses performance, security, scalability, and robustness requirements of Grid software infrastructure and has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has be widely deployed and applied.
Proceedings ArticleDOI
Large-scale cluster management at Google with Borg
TL;DR: A summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it are presented.
Journal ArticleDOI
A taxonomy and survey of grid resource management systems for distributed computing
TL;DR: In this article, an abstract model and a comprehensive taxonomy for describing resource management architectures is developed, which is used to identify approaches followed in the implementation of existing resource management systems for very large-scale network computing systems known as Grids.
Proceedings ArticleDOI
Quincy: fair scheduling for distributed computing clusters
TL;DR: It is argued that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures.
References
More filters
Journal ArticleDOI
Globus: a Metacomputing Infrastructure Toolkit
Ian Foster,Carl Kesselman +1 more
TL;DR: The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and net work, an integrated set of higher level services that enable applications to adapt to heteroge neous and dynamically changing metacomputing environ ments.
Proceedings ArticleDOI
Condor-a hunter of idle workstations
TL;DR: The design, implementation, and performance of the Condor scheduling system, which operates in a workstation environment, are presented and a performance profile of the system is presented that is based on data accumulated from 23 stations during one month.
Proceedings Article
End-to-End Arguments in System Design.
TL;DR: A design principle is presented that helps guide placement of functions among the modules of a distributed computer system and suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level.
Journal ArticleDOI
End-to-end arguments in system design
TL;DR: The end-to-end argument as discussed by the authors suggests that functions placed at low levels of a distributed computer system may be redundant or of little value when compared with the cost of providing them at that low level.
Proceedings Article
A resource management architecture for metacomputing systems.
Krzysztof Czajkowski,Ian Foster,Nicholas T. Karonis,Carl Kesselman,Stuart Martin,Warren Smith,Steven Tuecke +6 more
TL;DR: This work describes a resource management architecture that distributes the resource management problem among distinct local manager, resource broker, and resource co-allocator components and defines an extensible resource specification language to exchange information about requirements.