scispace - formally typeset
Open AccessProceedings ArticleDOI

A directory service for configuring high-performance distributed computations

Reads0
Chats0
TLDR
This work proposes a Metacomputing Directory Service that provides efficient and scalable access to diverse, dynamic, and distributed information about resource structure and state and defines an extensible data model to represent required information and presents a scalable, high-performance, distributed implementation.
Abstract
High-performance execution in distributed computing environments often requires careful selection and configuration not only of computers, networks, and other resources but also of the protocols and algorithms used by applications. Selection and configuration in turn require access to accurate, up-to-date information on the structure and state of available resources. Unfortunately no standard mechanism exists for organizing or accessing such information. Consequently different tools and applications adopt ad hoc mechanisms, or they compromise their portability and performance by using default configurations. We propose a Metacomputing Directory Service that provides efficient and scalable access to diverse, dynamic, and distributed information about resource structure and state. We define an extensible data model to represent required information and present a scalable, high-performance, distributed implementation. The data representation and application programming interface are adopted from the Lightweight Directory Access Protocol; the data model and implementation are new. We use the Globus distributed computing toolkit to illustrate how this directory service enables the development of more flexible and efficient distributed computing services and applications.

read more

Content maybe subject to copyright    Report

The submitted manuscript has been created
by the University
01
Chicago
as
Operator
of
Argonne National Laboratory ("Argonne")
under Contract No. W-31-109-ENG-38
with
the
U.S.
Department of Energy. The
U.S.
Government retains lor itself, and others act-
ing on its behalf, a paid-up, nonexclusive,
irrevocable worldwide license in said article
to reproduce, prepare derivative works, dis-
tribute copies
to
the public, and perform pub-
licly and display publicly, by
or
on behalf
of
the Government.
A
Directory Service
for
Configuring
High-Performance Distributed Computations
Steven Fitzgerald,'
Ian
Foster: Carl Kesselman,' Gregor von Laszewski?
Warren Smith? Steven Tuecke2
1V
JUL
07
0
ST.1
Information Sciences Institute
University
of
Southern California
Marina
del
Rey, CA
90292
Mathematics and Computer Sc,;nce
Argonne National Laboratory
Argonne,
IL
60439
http://www.globus.org/
Abstract
High-pelfotmance execution in distributed computing
envimnments ofren requires careful selection and configu-
ration not only
of
computers, networks, and other resources
but also
of
the protocols and algorithms used by applica-
tions. Selection and configuration in turn require access
to
accurate, up-to-date information
on
the structure and
state
of
available resources. Unfortunately,
no
standard
mechanism exists for organizing or accessing such infom-
tion. Consequently, diferent tools andapplications adopt ad
hoc mechanisms, or thq compromise their portability and
pegormance by using default configurations. We propose
a
Metacomputing Directory Service
that provides efficient
and scalable access to diverse, dynamic, and distributed
information about resource structure and state. We define
an
extensible data model
to
represent required information
and present a scalable, high-perfotmance, distributed
im-
plementation. The data representation and application pro-
gramming interface are adopted from the Lightweight Di-
rectory Access Protocol; the data model and implementation
are new. We use the Globus distributed computing toolkit to
illustrate how this directory service enables the development
of
more flexible and efficient distributed computing services
and applications.
1
Introduction
High-performance distributed computing often requires
careful selection and configuration of computers, networks,
application protocols, and algorithms. These requirements
do
not arise
in
traditional distributed computing, where con-
figuration problems can typically be avoided by the use of
standard default protocols, interfaces, and
so
on. The situ-
ation is also quite different in traditional high-performance
computing, where systems are usually homogeneous and
hence can be configured manually. But
in
high-performance
distributed computing, neither defaults nor manual config-
uration is acceptable. Defaults often do not result in ac-
ceptable performance, and manual configuration requires
low-level knowledge of remote systems that an average
programmer does not possess. We need an
infonnation-
rich
approach to configuration in which decisions are made
(whether at compile-time, link-time, or run-time
[
191)
based
upon information about the structure and state of the system
on which a program is to run.
An example from the I-WAY networking experiment il-
lustrates some of the difficulties associated with the configu-
ration of high-performance distributed systems. The I-WAY
was composed of massively parallel computers, worksta-
tions, archival storage systems, and visualization devices
161.
These resources were interconnected by both the internet
and a dedicated 155 Mbisec IP over
ATM
network. In this
environment, applications might run on a single
or
multi-
ple parallel computers, of the same or different types. An
optimal communication configuration for a particular situa-
tion might use vendor-optimized communication protocols
within a computer but
TCP/IP
between computers over an
ATM
network (if available). A significant amount of infor-
mation must be available to select such configurations, for
example:
0
What are the network interfaces (i.e., IP addresses) for
the
ATM
network and Internet?
What is the raw bandwidth of the
ATM
network and
the Internet, and which
.
I

DISCLAIMER
This
report
was
prepared
as
an account
of
work sponsored by an agency
of
the United
States Government. Neither the United States Government nor
any
agency thereof, nor
any
of
their employees, make any warranty, express or implied, or assumes any
legal
liabili-
ty or responsibility for the accuracy, completeness, or usefulness of any information, appa-
ratus,
product, or process
disclosed,
or
represents tbat
its
use
would not infringe privately
owned rights. Reference herein
to
any specific commercial product, process, or service by
trade name, trademark, manufacturer, or otherwise
does
not
necessarily
comtitute
or
imply its endorsement,
recommendation,
or favoring by the United States Government or
any agency thereof. The views and opinions of authors expressed herein do not necessar-
ily
state or reflect those of the United States Government or any agency thereof.

Portions
of
this
document
may
be
iiiegiile
in
electronic
image
products.
Images
are
produced
from
the
best
available
original
document.

0
Is the ATM network currently available?
0
Between which pairs of nodes can we use vendor pro-
tocols to access fast internal networks?
0
Between which pairs of nodes must we use TCP/IP?
Additional information
is
required if we use a resource
lo-
cation service to select an "optimal" set of resources from
among the machines available on the I-WAY at a given time.
In our experience, such configuration decisions
are
not
difficult
if
the right information is available. Until now,
however, this information has not been easily available, and
this lack of access has hindered application optimization.
Furthermore, making
this
information available in a useful
fashion is a nontrivial problem: the information required to
configure high-performance distributed systems is diverse
in scope, dynamic in value, distributed across the network,
and detailed in nature.
In this article, we propose an approach to the design
of high-performance distributed systems that addresses this
need for efficient and scalable access to diverse, dynamic,
and distributed information about the structure and state
of resources. The core of this approach is the definition
and implementation of a Metacomputing Directory Service
(MDS) that provides a uniform interface to diverse infor-
mation sources. We show how a simple data representa-
tion and application programming interface (API) based
on the Lightweight Directory Access Protocol (LDAP)
meet requirements for uniformity, extensibility, and dis-
tributed maintenance. We introduce a data model suitable
for distributed computing applications and show how this
model is able to represent computers and networks of inter-
est. We also present novel implementation techniques for
this service that address the unique requirements of high-
performance applications. Finally, we use examples from
the Globus distributed computing toolkit [9] to show how
MDS data can be used to guide configuration decisions with
realistic settings. We expect these techniques to be equally
useful in other systems that support computing in distributed
environments, such
as
Legion [12],
NEOS
[5],
NetSolve [4],
Condor [16], Nimrod [I], PRM [18], AppLeS
[2],
and het-
erogeneous implementations
of
MPI
[
131.
The principal contributions of this article are
0
a new architecture for high-performance distributed
computing systems, based upon an information service
called the Metacomputing Directory Service;
0
a design for this directory service, addressing issues of
data representation, data model, and implementation;
0
a data model able to represent the network structures
commonly used by distributed computing systems, in-
cluding various types of supercomputers; and
0
a demonstration of the use of
the
information provided
by MDS to guide resource and communication config-
uration within a distributed computing toolkit.
The rest of
this
article
is organized
as
follows.
In
Sec-
tion
2,
we explain the requirements that a distributed com-
puting information infrastructure must satisfy, and we pro-
pose MDS in response to these requirements. We then de-
scribe the representation (Section
3),
the data model (Sec-
tion 4), and the implementation (Section
5)
of MDS. In
Section 6, we demonstrate how MDS information is used
within Globus. We conclude in Section
7
with suggestions
for future research efforts.
2
Designing
a
Metacomputing Directory Ser-
vice
The problem of organizing and providing access to in-
formation is a familiar one in computer science, and there
are
many potential approaches to the problem, ranging from
database systems to the Simple Network Management Proto-
col
(SNMP).
The appropriate solution depends on the ways
in which the information is produced, maintained, accessed,
and used.
2.1
Requirements
Following are the requirements that shaped our design
of an information infrastructure for distributed computing
applications. Some of these requirements can be expressed
in quantitative terms (e.g., scalability, performance); others
are more subjective (e.g., expressiveness, deployability).
Performance.
The applications of interest to us frequently
operate on a large scale (e.g., hundreds of proces-
sors) and have demanding performance requirements.
Hence, an information infrastructure must permit rapid
access to frequently used configuration information. It
is not acceptable to contact a server for every item:
caching is required.
Scalability and cost.
The infrastructure must scale to large
numbers
of
components and permit concurrent access
by many entities. At the same time, its organization
must permit easy discovery of information. The human
and resource costs (CPU cycles, disk space, network
bandwidth) of creating and maintaining information
must also be low, both at individual sites and
in
total.
Uniformity.
Our goal is to simplify the development of
tools and applications that use data to guide config-
uration decisions. We require a uniform data model
as
well
as
an application programming interface (API)
for common operations on the data represented via that
2

model. One aspect of this uniformity is a standard rep-
resentation for data about common resources, such
as
processors and networks.
Expressiveness.
We require a data model rich enough to
represent relevant structure within distributed comput-
ing systems.
A
particular challenge is representing
characteristics
that
span organizations, for example net-
work bandwidth between sites.
Extensibility.
Any data model that we define will be in-
complete. Hence,
the
ability to incorporate additional
information is important. For example, an applica-
tion can use this facility to record specific information
about its behavior (observed bandwidth, memory re-
quirements) for
use
in subsequent
runs.
Multiple information
sources.
The information that we
require may be generated by many different sources.
Consequently, an information infrastructure must inte-
grate information from multiple sources.
Dynamic data.
Some of the data required by applications
is
highly dynamic: for example, network availability
or load. An information infrastructure must be able to
make this data availabIe in a timely fashion.
Flexible access.
We require the ability to both read and up-
date data contained within the information infrastruc-
ture. Some form of search capability is also required,
to
assist in locating stored data.
Security.
It is important to control who is allowed to update
configuration data. Some sites will also want to control
access.
Deployability.
An information infrastructure is useful only
if
is broadly deployed. In the current case, we require
techniques that can be installed and maintained easily
at many sites.
Decentralized maintenance.
It must be possible to dele-
gate the task of creating and maintaining information
about resources
to
the sites at which resources are
lo-
cated. This delegation is important for both scalability
and security reasons.
2.2
Approaches
It is instructive to review, with respect to these require-
ments, the various (incomplete) approaches to information
infrastructure that have been used by distributed computing
systems.
Operating system commands such
as
mame
and
sysinf
o
can provide important information about a partic-
ular machine but do not support remote access. SNMP
[21]
and the Network Information Service
(NIS)
both permit re-
mote access but are defined within the context of the IP
protocol suite, which can add significant overhead to a high-
performance computing environment. Furthermore, SNMP
does not define an API, thus preventing its use
as
a
compo-
nent within other software architectures.
High-performance computing systems such
as
PVM
[
111,
p4
[3],
and MPICH
[
131
provide rapid access to configura-
tion data by placing
this
data (e.g., machine names, network
interfaces) into files maintained by the programmer, called
“hostfiles.” However, lack
of
support for remote access
means that hostfiles must be replicated at each host, compli-
cating maintenance and dynamic update.
The Domain Name Service (DNS) provides
a
highly dis-
tributed, scalable service for resolving Internet addresses to
values (e.g., IP addresses) but is not, in general, extensible.
Furthermore, its update strategies are designed to support
values that change relatively rarely.
The
X.500
standard
[
14,
201
defines a directory service
that can be used to provide extensible distributed directory
services within a wide area environment. A directory service
is a service that provides read-optimized access to general
data about entities, such
as
people, corporations, and com-
puters.
X.500
provides a framework that could, in principle,
be used to organize the information that is of interest to us.
However, it is complex and requires IS0 protocols and the
heavyweight ASN.
1
encodings
of
data. For these and other
reasons, it is not widely used.
The Lightweight Directory Access Protocol
[24]
is
a
streamlined version of the
X.500
directory service. It re-
moves the requirement for an IS0 protocol stack, defining
a standard wire protocol based on the IP protocol suite. It
also simplifies the data encoding and command set of
X.500
and defines a standard API for directory access
[
151.
LDAP
is seeing wide-scale deployment
as
the directory service of
choice for the World Wide Web. Disadvantages include its
only moderate performance
(see
Section
5),
limited access
to external data sources, and rigid approach to distributing
data across servers.
Reviewing these various systems, we see that each is
in
some way incomplete, failing to address the types of
information needed to build high-performance distributed
computing systems, being too slow,
or
not defining an API
to enable uniform access to the service. For these reasons,
we have defined our own metacomputing information
in-
frastructure that integrates existing systems while providing
a uniform and extensible data model, support for multiple
information service providers, and a uniform API.
2.3
A
Metacomputing Directory Service
Our analysis of requirements and existing systems leads
us to define what we call the Metacomputing Directory Ser-
3

Citations
More filters
Journal ArticleDOI

Globus: a Metacomputing Infrastructure Toolkit

TL;DR: The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and net work, an integrated set of higher level services that enable applications to adapt to heteroge neous and dynamically changing metacomputing environ ments.
Proceedings ArticleDOI

Grid information services for distributed resource sharing

TL;DR: This work presents an information services architecture that addresses performance, security, scalability, and robustness requirements of Grid software infrastructure and has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has be widely deployed and applied.
Journal ArticleDOI

The network weather service: a distributed resource performance forecasting service for metacomputing

TL;DR: The current implementation of the NWS for Unix and TCP/IP sockets is described and examples of its performance monitoring and forecasting capabilities are provided.
Journal ArticleDOI

The data grid

TL;DR: In this paper, the authors introduce design principles for a data management architecture called the data grid, and describe two basic services that are fundamental to the design of a data grid: storage systems and metadata management.
Journal ArticleDOI

A taxonomy and survey of grid resource management systems for distributed computing

TL;DR: In this article, an abstract model and a comprehensive taxonomy for describing resource management architectures is developed, which is used to identify approaches followed in the implementation of existing resource management systems for very large-scale network computing systems known as Grids.
References
More filters
Journal ArticleDOI

Globus: a Metacomputing Infrastructure Toolkit

TL;DR: The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and net work, an integrated set of higher level services that enable applications to adapt to heteroge neous and dynamically changing metacomputing environ ments.
Proceedings ArticleDOI

Condor-a hunter of idle workstations

TL;DR: The design, implementation, and performance of the Condor scheduling system, which operates in a workstation environment, are presented and a performance profile of the system is presented that is based on data accumulated from 23 stations during one month.
Journal ArticleDOI

A high-performance, portable implementation of the MPI message passing interface standard

TL;DR: The MPI Message Passing Interface (MPI) as mentioned in this paper is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.

Portable implementation of the mpi message passing interface standard

TL;DR: The MPI Message Passing Interface (MPI) as discussed by the authors is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
Journal ArticleDOI

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

TL;DR: The PVM system, a heterogeneous network computing trends in distributed computing PVM overview other packages, and troubleshooting: geting PVM installed getting PVM running compiling applications running applications debugging and tracing debugging the system.
Related Papers (5)