scispace - formally typeset
Search or ask a question

Showing papers by "Dongsoo Han published in 2001"


Proceedings ArticleDOI
26 Jun 2001
TL;DR: Two mapping schemes for irregular cluster systems, which try to map the nearest neighbors in the process topology to physically adjacent processors, are proposed and an application-oriented performance metric, weighted cardinality, is introduced to represent the quality of mapping.
Abstract: Mapping virtual process topology to physical processor topology is one of the most important issues in parallel computing. The mapping problem for switch-based cluster systems of irregular topology is very complicated due to the connection irregularity and routing complexity. This paper proposes two mapping schemes for irregular cluster systems, which try to map the nearest neighbors in the process topology to physically adjacent processors. In addition, an application-oriented performance metric, weighted cardinality, is introduced to represent the quality of mapping. A simulation study shows that, for a virtual topology of a 16/spl times/16 mesh, the proposed mapping schemes result in better mapping quality and about 15/spl sim/20% shorter communication latency compared to random mapping. The proposed algorithms should also be beneficial when they are applied to metacomputing and cluster of cluster systems, where the communication costs are an order of magnitude different depending on the relative position of the processor nodes.

18 citations


01 Jan 2001
TL;DR: In this article, the authors proposed a Barrier Tree for Meshes (BTM) scheme to minimize the barrier synchronization latency for two-dimensional (2D) meshes, where nonmembers are neither involved in the construction of a BTM nor actively participate in the synchronization operations, which avoids interference among different process groups during synchronization.
Abstract: This paper proposes a Barrier Tree for Meshes (BTM) to minimize the barrier synchronization latency for two-dimensional (2D) meshes. The proposed BTM scheme has two distinguishing features. First, the synchronization tree is 4-ary. The synchronization latency of the BTM scheme is asymptotically Olog4nU, while that of the fastest scheme reported in the literature is bounded between Olog3nU andOOn 1=2 U, wheren is the number of member nodes. Second, nonmember nodes are neither involved in the construction of a BTM nor actively participate in the synchronization operations, which avoids interference among different process groups during synchronization. This not only results in low setup overhead, but also reduces the synchronization latency. The low setup overhead is particularly effective for the dynamic process model provided in MPI-2. Extensive simulation study shows that, for up to 64 64 meshes, the BTM scheme results in about 40 70 percent shorter synchronization latency and is more scalable than conventional schemes.

14 citations


Journal ArticleDOI
TL;DR: This paper proposes a Barrier Tree for Meshes (BTM) to minimize the barrier synchronization latency for two-dimensional (2D) meshes and shows that it results in about 40 70 percent shorter synchronization latency and is more scalable than conventional schemes.
Abstract: This paper proposes a Barrier Tree for Meshes (BTM) to minimize the barrier synchronization latency for two-dimensional (2D) meshes. The proposed BTM scheme has two distinguishing features. First, the synchronization tree is 4-ary. The synchronization latency of the BTM scheme is asymptotically /spl theta/(log/sub 4/ n), while that of the fastest scheme reported in the literature is bounded between /spl Omega/(log/sub 3/ n) and /spl theta/(n/sup 1/2/), where n is the number of member nodes. Second, nonmember nodes are neither involved in the construction of a BTM nor actively participate in the synchronization operations, which avoids interference among different process groups during synchronization. This not only results in low setup overhead, but also reduces the synchronization latency. The low setup overhead is particularly effective for the dynamic process model provided in MPI-2. Extensive simulation study shows that, for up to 64/spl times/64 meshes, the BTM scheme results in about 40/spl sim/70 percent shorter synchronization latency and is more scalable than conventional schemes.

13 citations


Journal ArticleDOI
TL;DR: Using the information about potential conflicts detected by the analysis, workflow designers can prevent the serious problems that can be raised by conflicts in runtime and they can convince whether their workflow definitions are free from such conflicts or not.

10 citations


Proceedings ArticleDOI
23 Apr 2001
TL;DR: A set-based constraint system is proposed to analyze possible read-write conflicts and write- write conflicts between activities which read and write to the shared variables in a workflow process definition.
Abstract: An error-comprising workflow definition might provoke serious problems to an enterprise, especially when it is involved with mission critical business processes. Concurrency of workflow processes is known as one of the major sources causing such an invalid workflow process definition. So, the conflicts caused by concurrent workflow processes should be considered deliberately when defining concurrent workflow processes. However, it is very difficult to ascertain whether a workflow process is free from conflicts or not without any experimental executions at runtime; this would be very tedious and time consuming work for process designers. If we can analyze the conflicts immanent in concurrent workflow definition prior to runtime, it would be very helpful to business process designers and many other users of workflow management systems. The authors propose a set-based constraint system to analyze possible read-write conflicts and write-write conflicts between activities which read and write to the shared variables in a workflow process definition. The system is composed of two phases. In the first phase, it generates set constraints from a structured workflow definition. In the second phase, it finds the minimal solution of the set constraints.

7 citations


Proceedings ArticleDOI
01 Jun 2001
TL;DR: A performance analytic model is described and its analysis results conceived to be helpful in the understanding of the spectrum of possibilities for large-scale workflow architecture to see how well the client-server workflow architecture is dealing with the large amount of workcases.
Abstract: We excogitate a performance analytic model and describe its analysis results conceived to be helpful in the understanding of the spectrum of possibilities for large-scale workflow architecture. The analytic model is extended to represent several types of client-server workflow architectures. Especially, we focus on performance estimates of the conventional workflow management systems that are characterized by the client-server workflow architectures. The development of a workflow management system is typically a large and complex task. Decisions need to be made about the hardware and software platforms, the data structures, the algorithms, and the interconnection of various modules utilized by various users and administrators. These design decisions are further complicated by the requirements, such as scalability, flexibility, robustness, speed, and usability. We are particularly concerned about issues of scalability to see how well the client-server workflow architecture is dealing with the large amount of workcases. Finally, we graphically show the comparisons of performance evaluation results for several types of client-server workflow architectures on behalf of the single-server, and the multiple-server workflow systems on the distribution environment.

6 citations


Proceedings ArticleDOI
23 Apr 2001
TL;DR: This work analyzes the type and scope of disconnected service in WFMSs through the in-depth analysis of workflow task models and shows and discusses four general issues that should be addressed to support disconnected operation in WfMSs.
Abstract: With the network and computing environment improvement in both wireless and wired area, mobile or portable devices like palmtops and notebooks have become prevalent. They are even considered as essential ingredients of the business workplace. Thus this mobile business environment should be accommodated in the business automation systems like workflow management systems (WFMSs). Researchers have focused on disconnected operation to provide continuous and safe services for the mobile devices in various fields. However disconnected operation has not been fully addressed in the area of WFMSs at all. We analyze the type and scope of disconnected service in WFMSs through the in-depth analysis of workflow task models. Based on this analysis, we show and discuss four general issues that should be addressed to support disconnected operation in WFMSs. The discussions are mainly focused on voluntary disconnection in the wired environment and issues like task classification, task relevant data handling, application handling, and task state emulation are included.

5 citations


Proceedings ArticleDOI
26 Jun 2001
TL;DR: This paper presents a practical loop transform technique which can significantly reduce the monitoring overhead required for detecting races on-the-fly in parallel programs by minimizing the number of iteration counts to be monitored of each parallel loop by transforming the original loop with the technique.
Abstract: Races might result in unintended nondeterministic execution of parallel programs and thus race detection is one of the critical issues to be resolved in debugging of shared-memory parallel programs. On-the-fly race detection techniques have been developed as one of approaches for the problem. However on-the-fly race detection techniques suffer from the huge run-time overhead because the whole execution behavior of the program being debugged must be monitored at run-time. In this paper we present a practical loop transform technique which can significantly reduce the monitoring overhead required for detecting races on-the-fly in parallel programs. Our technique achieves the improvement by minimizing the number of iteration counts to be monitored of each parallel loop by transforming the original loop with the technique. An experimental performance measurement of our technique shows dramatic improvement on the monitoring overhead and it detects more races than those detected by traditional on-the-fly techniques.

3 citations