scispace - formally typeset
Search or ask a question

Showing papers in "Cluster Computing in 1998"


Journal ArticleDOI
TL;DR: The Network Weather Service (NWS) as discussed by the authors is a generalizable and extensible facility designed to provide dynamic resource performance forecasts in metacomputing environments, including TCP/IP end-to-end throughput and latency.
Abstract: The Network Weather Service is a generalizable and extensible facility designed to provide dynamic resource performance forecasts in metacomputing environments. In this paper, we outline its design and detail the predictive performance of the forecasts it generates. While the forecasting methods are general, we focus on their ability to predict the TCP/IP end-to-end throughput and latency that is attainable by an application using systems located at different sites. Such network forecasts are needed both to support scheduling (Berman et al., 1996) and, by the metacomputing software infrastructure, to develop quality-of-service guarantees (DeFanti et al., to appears Grimshaw et al., 1994).

436 citations


Journal ArticleDOI
TL;DR: A new multicast protocol for multihop mobile wireless networks where a group of nodes in charge of forwarding multicast packets is designated according to members' requests, making the protocol more robust to mobility.
Abstract: In this paper we propose a new multicast protocol for multihop mobile wireless networks Instead of forming multicast trees, a group of nodes in charge of forwarding multicast packets is designated according to members’ requests Multicast is then carried out via “scoped” flooding over such a set of nodes The forwarding group is periodically refreshed to handle topology/membership changes Multicast using forwarding group takes advantage of wireless broadcast transmissions and reduces channel and storage overhead, thus improving the performance and scalability The key innovation with respect to wired multicast schemes like DVMRP is the use of flags rather than upstream/downstream link state, making the protocol more robust to mobility The dynamic reconfiguration capability makes this protocol particularly suitable for mobile networks The performance of the proposed scheme is evaluated via simulation and is compared to that of DVMRP and global flooding

360 citations


Journal ArticleDOI
TL;DR: This work introduces a self organizing network structure called a spine and proposes a spine-based routing infrastructure for routing in ad hoc networks and proposes two spine routing algorithms: Optimal Spine Routing (OSR), which uses full and up-to-date knowledge of the network topology, and (b) Partial-knowledge SpineRouting (PSR, which uses partialknowledge of thenetwork topology.
Abstract: An ad hoc network is a multihop wireless network in which mobile hosts communicate without the support of a wired backbone for routing messages. We introduce a self organizing network structure called a spine and propose a spine-based routing infrastructure for routing in ad hoc networks. We propose two spine routing algorithms: (a) Optimal Spine Routing (OSR), which uses full and up-to-date knowledge of the network topology, and (b) Partial-knowledge Spine Routing (PSR), which uses partial knowledge of the network topology. We analyze the two algorithms and identify the optimality-overhead trade-offs involved in these algorithms.

110 citations


Journal ArticleDOI
TL;DR: Three adaptive cache invalidation report methods are presented, in which the server broadcasts different invalidation reports according to the update and query rates/patterns and client disconnection time while spending little uplink cost.
Abstract: Caching of frequently accessed data items can reduce the bandwidth requirement in a mobile wireless computing environment. Periodical broadcast of invalidation reports is an efficient cache invalidation strategy. However, this strategy is severely affected by the disconnection and mobility of the clients. In this paper, we present three adaptive cache invalidation report methods, in which the server broadcasts different invalidation reports according to the update and query rates/patterns and client disconnection time while spending little uplink cost. Simulation results show that the adaptive invalidation methods are efficient in improving mobile caching and reducing the uplink and downlink costs without degrading the system throughput.

88 citations


Journal ArticleDOI
Allen B. Downey1
TL;DR: In this paper, the authors developed a workload model based on the observed behavior of parallel computers at the San Diego Supercomputer Center and the Cornell Theory Center, which gives insight into the performance of strategies for scheduling moldable jobs on space-sharing parallel computers.
Abstract: We develop a workload model based on the observed behavior of parallel computers at the San Diego Supercomputer Center and the Cornell Theory Center. This model gives us insight into the performance of strategies for scheduling moldable jobs on space-sharing parallel computers. We find that Adaptive Static Partitioning (ASP), which has been reported to work well for other workloads, does not perform as well as strategies that adapt better to system load. The best of the strategies we consider is one that explicitly reduces allocations when load is high (a variation of Sevcik’s (1989) A+ strategy).

50 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe the current protocol specification for the Route Optimization protocol, concentrating on design decisions and justifications, and describe the security operations that enable reliable operation of this handoff between foreign agents with which a mobile node has no pre-existing security relationship.
Abstract: Route Optimization has been designed within the IETF to ameliorate the problem of triangle routing, a routing artifact introduced by Mobile IP’s requirement to route packets destined for a mobile node by way of its home network. In this article, we describe the current protocol specification for the Route Optimization protocol, concentrating on design decisions and justifications. Once the basic mechanisms are explained, we show how they are applied to enable foreign agents to offer smooth handoffs for mobile nodes, and describe the security operations that enable reliable operation of this handoff between foreign agents with which a mobile node has no pre-existing security relationship.

49 citations


Journal ArticleDOI
TL;DR: It is shown that both SPMD and task parallel applications can be scheduled effectively in a shared heterogeneous LAN environment containing ethernet and ATM networks by exploiting the application structure and dynamic run-time information.
Abstract: Prophet is a run-time scheduling system designed to support the efficient execution of parallel applications written in the Mentat programming language (Grimshaw, 1993). Prior results demonstrated that SPMD applications could be scheduled automatically in an ethernet-based local-area workstation network with good performance (Weissman and Grimshaw, 1994 and 1995). This paper describes our recent efforts to extend Prophet along several dimensions: improved overhead control, greater resource sharing, greater resource heterogeneity, wide-area scheduling, and new application types. We show that both SPMD and task parallel applications can be scheduled effectively in a shared heterogeneous LAN environment containing ethernet and ATM networks by exploiting the application structure and dynamic run-time information.

46 citations


Journal ArticleDOI
TL;DR: A security‐enhanced version of a communication library is developed, which is then used to provide secure versions of various parallel libraries and languages, including the popular Message Passing Interface, to support the development of applications that use high‐speed networks to connect geographically distributed supercomputers, databases, and scientific instruments.
Abstract: We describe a software infrastructure designed to support the development of applications that use high-speed networks to connect geographically distributed supercomputers, databases, and scientific instruments. Such applications may need to operate over open networks and access valuable resources, and hence can require mechanisms for ensuring integrity and confidentiality of communications and for authenticating both users and resources. Yet security solutions developed for traditional client-server applications do not provide direct support for the distinctive program structures, programming tools, and performance requirements encountered in these applications. To address these requirements, we are developing a security-enhanced version of a communication library called Nexus, which is then used to provide secure versions of various parallel libraries and languages, including the popular Message Passing Interface. These tools support the wide range of process creation mechanisms and communication structures used in high-performance computing. They also provide a fine degree of control over what, where, and when security mechanisms are applied. In particular, a single application can mix secure and nonsecure communication, allowing the programmer to make fine-grained security/performance tradeoffs. We present performance results that enable us to quantify the performance of our infrastructure.

43 citations


Journal ArticleDOI
TL;DR: This paper describes two versions of an ATM-based VCM implementation, which differ in the way they use the memory on the network adapter, and evaluates the scalability of the architecture to multiple VCM-based network interfaces per host.
Abstract: This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed networks. The architecture addresses what is considered one of the most important problems of cluster-based parallel computing: the inherent inability of scaling the performance of communication software along with the host CPU performance. The Virtual Communication Machine (VCM), resident on the network coprocessor, presents a scalable software solution by providing configurable communication functionality directly accessible at user-level. The VCM architecture is configurable in that it enables the transfer to the VCM of selected communication-related functionality that is traditionally part of the application and/or the host kernel. Such transfers are beneficial when a significant reduction of the host CPU’s load translates into a small increase in the coprocessor’s load. The functionality implemented by the coprocessor is available at the application level as VCM instructions. Host CPU(s) and coprocessor interact through shared memory regions, thereby avoiding expensive CPU context switches. The host kernel is not involved in this interactions it simply “connects” the application to the VCM during the initialization phase and is called infrequently to handle exceptional conditions. Protection is enforced by the VCM based on information supplied by the kernel. The VCM-based communication architecture admits low cost and open implementations, as demonstrated by its current ATM-based implementation based on off-the-shelf hardware components and using standard AAL5 packets. The architecture makes it easy to implement communication software that exhibits negligible overheads on the host CPU(s) and offers latencies and bandwidths close to the hardware limits of the underlying network. These characteristics are due to the VCM’s support for zero-copy messaging with gather/scatter capabilities and the VCM’s direct access to any data structure in an application’s address space. This paper describes two versions of an ATM-based VCM implementation, which differ in the way they use the memory on the network adapter. Their performance under heavy load is compared in the context of a synthetic client/server application. The same application is used to evaluate the scalability of the architecture to multiple VCM-based network interfaces per host. Parallel implementations of the Traveling Salesman Problem and of Georgia Tech Time Warp, an engine for discrete-event simulation, are used to demonstrate VCM functionality and the high performance of its implementation. The distributed- and shared-memory versions of these two applications exhibit comparable performance, despite the significant cost-performance advantage of the distributed-memory platform.

23 citations


Journal ArticleDOI
TL;DR: Targets on the maximum obtainable efficiency of multicasting data to mobile users in a cellular mobile network are developed and algorithms that achieve this bound are presented.
Abstract: We consider the problem of multicasting data to mobile users in a cellular mobile network. In the absence of mobility, a single channel can be used to multicast to all mobile users within a cell. However, mobility combined with the effects of fading necessitates a more complex channel allocation policy. In this paper we develop theoretical bounds on the maximum obtainable efficiency and present algorithms that acheive this bound. Our results hold for the case when mobiles travel on a highway, as well as for the more general case where mobiles roam in a two-dimensional region.

23 citations


Journal ArticleDOI
TL;DR: This paper presents the Virtual Distributed Computing Environment (VDCE), a metacomputing environment currently being developed at Syracuse University that provides an efficient web-based approach for developing, evaluating, and visualizing large-scale distributed applications that are based on predefined task libraries on diverse platforms.
Abstract: Current advances in high-speed networks such as ATM and fiber-optics, and software technologies such as the JAVA programming language and WWW tools, have made network-based computing a cost-effective, high-performance distributed computing environment. Metacomputing, a special subset of network-based computing, is a well-integrated execution environment derived by combining diverse and distributed resources such as MPPs, workstations, mass storage, and databases that show a heterogeneous nature in terms of hardware, software, and organization. In this paper we present the Virtual Distributed Computing Environment (VDCE), a metacomputing environment currently being developed at Syracuse University. VDCE provides an efficient web-based approach for developing, evaluating, and visualizing large-scale distributed applications that are based on predefined task libraries on diverse platforms. The VDCE task libraries relieve end-users of tedious task implementations and also support reusability. The VDCE software architecture is described in terms of three modules: (a) the Application Editor, a user-friendly application development environment that generates the Application Flow Graph (AFG) of an applications (b) the Application Scheduler, which provides an efficient task-to-resource mapping of AFGs and (c) the VDCE Runtime System, which is responsible for running and managing application execution and for monitoring the VDCE resources. We present experimental results of an application execution on the VDCE prototype for evaluating the performance of different machine and network configurations. We also show how the VDCE can be used as a problem-solving environment on which large-scale, network-centric applications can be developed by a novice programmer rather than by an expert in low-level details of parallel programming languages.

Journal ArticleDOI
TL;DR: Analysis results indicate that collision resolution makes floor acquisition multiple access much more effective.
Abstract: The collision avoidance and resolution multiple access (CARMA) protocol is presented and analyzed. CARMA uses a collision avoidance handshake in which the sender and receiver exchange a request to send (RTS) and a clear to send (CTS) before the sender transmits any data. CARMA is based on carrier sensing, together with collision resolution based on a deterministic tree-splitting algorithm. For analytical purposes, an upper bound is derived for the average number of steps required to resolve collisions of RTSs using the tree-splitting algorithm. This bound is then applied to the computation of the average channel utilization in a fully connected network with a large number of stations. Under light-load conditions, CARMA achieves the same average throughput as multiple access protocols based on RTS/CTS exchange and carrier sensing. It is also shown that, as the arrival rate of RTSs increases, the throughput achieved by CARMA is close to the maximum throughput that any protocol based on collision avoidance (i.e., RTS/CTS exchange) can achieve if the control packets used to acquire the floor are much smaller than the data packet trains sent by the stations. Simulation results validate the simplifying approximations made in the analytical model. Our analysis results indicate that collision resolution makes floor acquisition multiple access much more effective.

Journal ArticleDOI
TL;DR: This paper presents PARDIS, a system which addresses this demand by providing support for interoperability of PARallel DIStributed applications by introducing SPMD objects representing data-parallel computations and presents microbenchmark results which evaluate the performance potential of S PMD objects for data structures of diverse complexity and different network configurations.
Abstract: To fully realize its potential, distributed supercomputing requires abstractions and environments facilitating development of efficient applications. In this paper we present PARDIS, a system which addresses this demand by providing support for interoperability of PARallel DIStributed applications. The design of PARDIS is based on the Common Object Request Broker Architecture (CORBA). Like CORBA, it provides interoperability between heterogeneous components by specifying their interfaces in a meta-language, the CORBA IDL, which can be translated into the language of interacting components. However, PARDIS extends the CORBA object model by introducing SPMD objects representing data-parallel computations. This extension allows us to build interactions involving data-parallel components, which exchange distributed data structures whose definitions are captured by distributed sequences. We present microbenchmark results which evaluate the performance potential of SPMD objects for data structures of diverse complexity and different network configurations. Based on these results, we conclude that while encapsulating the existence of multiple interactions SPMD objects also allow their efficient utilization, and therefore constitute a useful abstraction.

Journal ArticleDOI
TL;DR: A protocol that forms the building block for implementing load balancing schemes in which multiple home agents are used to provide mobility support is designed, and the performance characteristics of three selection schemes, namely, random, round-robin, and join the shortest queue (JSQ), and three transfers policies are studied.
Abstract: Mobility support in IP networks requires servers to forward packets to mobile hosts and to maintain information pertaining to a mobile host’s location in the network. In the mobile Internet Protocol (mobile-IP), location and packet forwarding functions are provided by servers referred to as home agents. These home agents may become the bottleneck when there are a large number of mobile hosts in the network. In this paper, we consider the design and analysis of a replicated server architecture in which multiple home agents are used to provide mobility support. In order to minimize the delay across the home agents, one of the key aspects is the design of load balancing schemes in which a home agent may transfer the control of a mobile host to another home agent in the same network. The methods for triggering the transfer and the policy for selecting the next home agent define various load balancing schemes which have different performance characteristics. In this paper, we design a protocol that forms the building block for implementing such load balancing schemes, and we then study the performance characteristics of three selection schemes, namely, random, round-robin, and join the shortest queue (JSQ), and three transfers policies, namely, timer-, counter- and threshold-based. The key results of this study are as follows: (1) The results show that both random and round-robin selection policies can yield modest load balancing gains, and that these gains increase when the traffic is more bursty (burstiness is defined as the ratio of the peak arrival rate to the mean arrival rate) as well as when there are more home agents. (2) The threshold-based transfer policy performs better than timer-based and counter-based policies, since in threshold-based policies transfers are made only when the queue is overloaded, unlike counter- and timer-based policies in which transfers can be made from an unloaded home agent to an overloaded home agent.

Journal ArticleDOI
TL;DR: An elaborate discussion of issues resulting from this reorganization in this new paradigm taking into account both mobile and traditional clients is provided, and three different techniques for organizing data on the server based on the hoard attribute are presented.
Abstract: The use of mobile computers is gaining popularity. There is an increasing trend in the number of users with laptops, PDAs, and smart phones. Access to information repositories in the future will be dominated by mobile clients rather than traditional “fixed” clients. These mobile clients download information by periodically connecting to repositories of data stored in either databases or file systems. Such mobile clients constitute a new and different kind of workload and exhibit a different access pattern than seen in traditional client server systems. Though file systems have been modified to handle clients that can download information, disconnect, and later reintegrate, databases have not been redesigned to accommodate mobile clients. There is a need to support mobile clients in the context of client server databases. This paper is about organizing the database server to take into consideration the access patterns of mobile clients. We propose the concept of hoard attributes which capture these access patterns. Three different techniques for organizing data on the server based on the hoard attribute are presented. We argue that each technique is suited for a particular workload. The workload is a combination of requests from mobile clients and traditional clients. This reorganization also allows us to address issues of concurrency control, disconnection and replica control in mobile databases. We present simulation results that show the performance of server reorganization using hoard attributes. We also provide an elaborate discussion of issues resulting from this reorganization in this new paradigm taking into account both mobile and traditional clients.

Journal ArticleDOI
TL;DR: This paper considers the vertical dependencies between various layers in the protocol stack, studying the performance of the Network File System under various error models and improvement techniques, and improves NFS performance by implementing changes to the application level reliability mechanisms.
Abstract: Wireless networks experience a high level of errors and losses. These physical layer characteristics have an impact on the performance of the higher layers. In addition, the performance of each protocol layer is contingent on the behavior of the other layers. Vertical dependency is a term which describes this inter-dependence between layers. In the wireless and mobile environment, the effects of vertical dependence are particularly pronounced due to the dynamic nature of the environment and due to the fact that traditional assumptions about protocol layer interactions do not always hold. In this paper, we consider the vertical dependencies between various layers in the protocol stack, studying the performance of the Network File System under various error models and improvement techniques. Our experimental results demonstrate the dependency of the application performance on the details of the error characteristics and other protocol layers. After studying the vertical dependencies, we improve NFS performance by implementing changes to the application level reliability mechanisms. Understanding of the vertical dependencies enables development of effective methods for performance enhancement and efficient reaction to errors and changes on the wireless media.

Journal ArticleDOI
TL;DR: This work presents a prefetching technique that can avoid the delay in transferring pages from the server to the client, especially where the client application requests pages from several database servers.
Abstract: Given the existence of powerful multiprocessor client workstations in many client-server object database applications, the performance bottleneck is the delay in transferring pages from the server to the client. We present a prefetching technique that can avoid this delay, especially where the client application requests pages from several database servers. This technique has been added to the EXODUS storage manager. Part of the novelty of this approach lies in the way that multithreading on the client workstation is exploited, in particular for activities such as prefetching and flushing dirty pages to the server. Using our own complex object benchmark, we analyze the performance of the prefetching technique with multiple clients and multiple servers. The technique is also tested under a variety of client host workload levels.

Journal ArticleDOI
TL;DR: In this article, the authors consider parallel computing on a network of workstations using a connection-oriented protocol (e.g., Asynchronous Transfer Mode) for data communication and present a strategy based on dynamic redistribution of data points to reduce the bottlenecks caused by unequal bandwidths.
Abstract: We consider parallel computing on a network of workstations using a connection-oriented protocol (e.g., Asynchronous Transfer Mode) for data communication. In a connection-oriented protocol, a virtual circuit of guaranteed bandwidth is established for each pair of communicating workstations. Since all virtual circuits do not have the same guaranteed bandwidth, a parallel application must deal with the unequal bandwidths between workstations. Since most works in the design of parallel algorithms assume equal bandwidths on all the communication links, they often do not perform well when executed on networks of workstations using connection-oriented protocols. In this paper, we first evaluate the performance degradation caused by unequal bandwidths on the execution of conventional parallel algorithms such as the fast Fourier transform and bitonic sort. We then present a strategy based on dynamic redistribution of data points to reduce the bottlenecks caused by unequal bandwidths. We also extend this strategy to deal with processor heterogeneity. Using analysis and simulation we show that there is a considerable reduction in the runtime if the proposed redistribution strategy is adopted. The basic idea presented in this paper can also be used to improve the runtimes of other parallel applications in connection-oriented environments.

Journal ArticleDOI
TL;DR: Numerical results show that wired resource reservation methods can flexibly cope with the time-variant environment and meet the QoS requirements on the inter–cluster handoff calls.
Abstract: On the ATM-based wired/wireless integrated network, we propose a connection re-routing method which reduces the inter-cluster handoff delay by reserving VPI/VCIs for possible inter-cluster handoff calls in advance. Additionally, we propose wired resource reservation methods, which are the auxiliary method and the split method, for handoff QoS guarantee of various expected services. The characteristics of these methods reserve wired connection resources based on the information on the possible inter-cluster handoff calls. With mathematical analysis, we also propose an algorithm and cost function for deciding the optimal amount in reserving resources. With numerical examples, we can see that the auxiliary method effectively reduces the cost in all cases (\alpha >\beta, \alpha =\beta, and \alpha <\beta). The split method, however, has good effects on cost reduction, only in case that the capacity of total resource C_\mathrm{T} is relatively small and handoff calls have priority over new calls. The numerical results show that these reservation methods can flexibly cope with the time-variant environment and meet the QoS requirements on the inter-cluster handoff calls.