scispace - formally typeset
Search or ask a question

Showing papers presented at "Parallel and Distributed Processing Techniques and Applications in 2006"


Proceedings Article
01 Jan 2006
TL;DR: This work presents an updated solution for Linux 2.6, which uses the more recent NPTL (Native POSIX Thread Library) and can take advantage of the ELF architecture to eliminate the earlier requirement to patch the user’s main routine.
Abstract: Checkpointing of single-threaded applications has been long studied [3], [6], [8], [12], [15]. Much less research has been done for user-level checkpointing of multithreaded applications. Dieter and Lumpp studied the issue for LinuxThreads in Linux 2.2. However, that solution does not work on later versions of Linux. We present an updated solution for Linux 2.6, which uses the more recent NPTL (Native POSIX Thread Library). Unlike the earlier solution, we do not need to patch glibc. Additionally, the new implementation can take advantage of the ELF architecture to eliminate the earlier requirement to patch the user’s main routine. This fills in the missing link for full transparency. As one demonstration of the robustness, we checkpoint the Kaffe Java Virtual Machine including any of several multithreaded Java programs running on top of it.

56 citations


Proceedings Article
01 Jan 2006
TL;DR: A new grey-based approach to deal with selection problem of suppliers is proposed, which uses grey possibility degree to determine the ranking order of all alternatives.
Abstract: The suppliers selection is a multiple attribute decision making (MADM) problem. Since the decision maker (DM)s like preferences on alternatives or on attributes of suppliers are often uncertain, and thus the selection of good suppliers becomes more difficult. Grey theory is one of the methods that are used to study uncertainty, it is superior in mathematical analysis of systems with uncertain information. In this paper, we proposed a new grey-based approach to deal with selection problem of suppliers. The work procedure is shown as follows briefly: First, the weight and rating of attribute for all alternatives are described by linguistic variables that can be expressed in grey number. Second, proposed grey possibility degree to determine the ranking order of all alternatives. Finally, an example of selection problem of suppliers was used to illustrate the proposed approach.

33 citations


Proceedings Article
01 Jan 2006
TL;DR: An anonymous self-stabilizing algorithm for finding a 1-maximal matching in trees, and rings of length not divisible by 3 that converges in O(n) moves under an arbitrary central daemon is presented.
Abstract: We present an anonymous self-stabilizing algorithm for finding a 1-maximal matching in trees, and rings of length not divisible by 3. We show that the algorithm converges in O(n) moves under an arbitrary central daemon.

27 citations


Proceedings Article
01 Jan 2006
TL;DR: The motivations for this infrastructure and the revisions that should make it a general purpose solution for users on PSC’s Cray XT3 and other compute platforms are presented.
Abstract: Portals Direct I/O ("PDIO") is a specialpurpose middleware infrastructure for writing data from compute processor memory on Portals-enabled compute nodes to remote agents anywhere on the WAN in realtime. The prototype implementation provided a means for aggregation of outgoing data through multiple loadbalanced routing daemons, end-to-end parallel data streams through externally connected “I/O nodes”, and a bandwidth feedback mechanism for stability and robustness. It was used by one research group, demonstrated live at several conferences, and shown to deliver bandwidths of up to 800 Mbit/sec. Although the prototype met the initial design requirements for the target application, it had some limitations due to the special-purpose nature of that design. Based on experiences with that implementation, the beta version now under development has a number of interface, functionality and performance enhancements. We present the motivations for this infrastructure and the revisions that should make it a general purpose solution for users on PSC’s Cray XT3 and other compute platforms.

20 citations


Proceedings Article
01 Jan 2006
TL;DR: This paper proposes a cluster based routing algorithm to extend the lifetime of the networks and to maintain a balanced energy consumption of nodes, and adds a tiny slot in a round frame which enables to exchange the residual energy messages between the base station, cluster heads, and nodes.
Abstract: The efficient node-energy utilization is one of important performance factors in wireless sensor networks because sensor nodes operate with limited battery power. In this paper, we proposed a cluster based routing algorithm to extend the lifetime of the networks and to maintain a balanced energy consumption of nodes. To obtain it, we add a tiny slot in a round frame, which enables to exchange the residual energy messages between the base station (BS), cluster heads, and nodes. The slot is used in the Pre-setup phase. The performance of the proposed protocol has been examined and evaluated with the NS-2 simulator. As a result of simulation, we have confirmed that our proposed algorithm shows the better performance in terms of lifetime than LEACH. Also if we use a simulation mode of the large number of nodes (or 1000 or more), we expect that our protocol will clearly make network lifetime much longer compared to LEACH. Consequently, our proposed protocol can effectively extend the network lifetime without other critical overheads and performance degradation. <

15 citations


Proceedings Article
01 Jan 2006
TL;DR: This paper discusses the experiences in porting FLASH to the BG/L platform and how it dealt with issues of optimization, algorithm scalability, and I/O, and describes the challenges of managing terabytes of data in millions of files generated by the production run on the Livermore BG/l.
Abstract: The new BG/L machine from IBM has a massively parallel computing paradigm of large numbers of relatively slow and simple processors, and implements parallelism on a scale an order of magnitude larger than other available platforms. However, this hardware architecture presents opportunities and challenges to high performance software applications. The FLASH code, developed at the Center for Astrophysical Thermonuclear Flashes at the University of Chicago, is a modular, adaptive mesh code used for simulating compressible, reactive flows found in astrophysical environments. FLASH was instrumental in the early testing of the system software on the prototype BG/L platform. Later, the FLASH Center produced a huge science run at the Lawrence Livermore National Laboratory, on the largest current BG/L installation. In this paper we discuss our experiences in porting FLASH to the BG/L platform and how we dealt with issues of optimization, algorithm scalability, and I/O. We also describe the challenges of managing terabytes of data in millions of files generated by the production run on the Livermore BG/L.

15 citations


Proceedings Article
01 Jan 2006
TL;DR: A comprehensive analytic model is presented to study the interplay among the number of parallel processors, the maximum degree of processor sharing, the overhead, and the job arrival rate to examine how the CPU time distribution affects mean system time.
Abstract: In principle unrestricted processor sharing can be very useful when jobs with widely varying CPU requirements are competing for the same processor. Even if there are several processors available, processor sharing can be useful. However, in practice it must be implemented by round-robin, and there is an overhead cost (e.g., cache thrashing) to implementing this scheme. Furthermore, the overhead may depend on the number of jobs that are active, and can be significant. Therefore restricted processor sharing, which only allows a limited number of jobs to share the processors, may be a more appropriate strategy. In this paper we present a comprehensive analytic model to study the interplay among the number of parallel processors, the maximum degree of processor sharing, the overhead, and the job arrival rate. We examine how the CPU time distribution affects mean system time (or response time), under what conditions two slow processors are better than one double fast one, and when it pays to invoke restricted processor

14 citations


Proceedings Article
01 Jan 2006
TL;DR: This paper defines a methodology for quantifiably determining a resource allocation’s ability to satisfy QoS constraint in the midst of uncertainty in system parameters and the established stochastic model is employed to develop greedy resource allocation heuristics.
Abstract: This research investigates the problem of robust resource allocation for a large class of systems operating on periodically updated data sets under an imposed quality of service (QoS) constraint. Such systems are expected to function in an environment replete with uncertainty where the workload is likely to fluctuate substantially. Determining a resource allocation that accounts for this uncertainty in a way that can provide a probabilistic guarantee that a given level of QoS is achieved is an important research problem. First, this paper defines a methodology for quantifiably determining a resource allocation’s ability to satisfy QoS constraint in the midst of uncertainty in system parameters. Uncertainty in system parameters and its impact on system performance are modeled stochastically. Second, the established stochastic model is employed to develop greedy resource allocation heuristics. Finally, the utility of the proposed stochastic robustness metric and the performance of the heuristics are evaluated in a simulated environment that replicates a heterogeneous cluster-based radar system.

13 citations


Proceedings Article
01 Jan 2006
TL;DR: Two, ASENs and ZTNs, fault-tolerant multistage interconnection networks of different class are compared, based on their analytic bounds of MTTF, cost, and cost-effectiveness, and it is revealed that Z TNs are better in terms of cost and cost -effectiveness while both networks are equally reliable in Terms of upper bounds ofMTTF.
Abstract: Two, ASENs and ZTNs, fault-tolerant multistage interconnection networks of different class are compared, based on their analytic bounds of MTTF, cost, and cost-effectiveness. The full-access condition and deadfault model are used for the MTTF analysis. The simulation study reveals that ZTNs are better in terms of cost and cost-effectiveness while both networks are equally reliable in terms of upper bounds of MTTF.

12 citations


Proceedings Article
01 Jan 2006
TL;DR: The differential technique is improved and a protocol is proposed to maintain a virtual network topology of a logical ring combined with multiple computation trees so that the differential technique can be applied to dynamic systems.
Abstract: A system of vector clocks is strongly consistent and it captures the happened before relations among events in the system. These clocks underlie solutions to a number of problems in distributed systems including, among others, detecting global predicates, debugging distributed programs, causally ordering multicast messages, and implementing a distributed shared memory. In general, a data structure of size n, where n is the number of processes in the system, has to be maintained at each process and attached with each message communicated in the system to implement vector clocks. This is a considerable communication overhead in large systems. A differential technique has been proposed to reduce this required communication overhead for static systems with FIFO channels. In this study, the differential technique is improved to further reduce the required communication overhead. A protocol is proposed to maintain a virtual network topology of a logical ring combined with multiple computation trees so that the differential technique can be applied to dynamic systems. When a process leaves the clock maintained at this process is taken over by another one in the system. At the time a process joins the system, it will inherits the causality relations maintained at the process that creates the new process. Correctness of the protocol and the clock properties are proved as well.

12 citations


Proceedings Article
01 Jan 2006
TL;DR: The experimental results indicate that through careful design of processor allocation and job scheduling methods the overall system performance on computing grid can be improved more than 20 times in terms of waiting ratio.
Abstract: Processor allocation and job scheduling are two complementary techniques for improving the performance of parallel systems. This paper presents an effort in studying the issues of processor allocation and job scheduling on the emerging computing grid platform and developing an integrated approach to efficient workload management. The experimental results indicate that through careful design of processor allocation and job scheduling methods the overall system performance on computing grid can be improved more than 20 times in terms of waiting ratio.

Proceedings Article
01 Jan 2006
TL;DR: The topology-aware parallel molecular dynamics algorithm, in which the processors are rearranged automatically according to resource topology so as to minimize the cost required for the simulation, is developed.
Abstract: We have developed the topology-aware parallel molecular dynamics (TAPMD) algorithm, in which the processors are rearranged automatically according to resource topology so as to minimize the cost required for the simulation. It is demonstrated that TAPMD can reduce the communication time to less than half compared to the time in the worst case on a distributed PC clusters. This improvement is expected to be more significant when the communication time is dominating over the total wall-clock time and when the resource topology consists of more types of the clusters. Additional tests involving several clusters having different types of connections as well as different types of processors are under progress.

Proceedings Article
01 Jan 2006
TL;DR: In this article, a new semantics based Web services discovery framework by extending UDDI framework is proposed, which provides operation level service discovery compared to UDDi architecture, the discovery process is in two stages, the match engine will search for suitable services to compose the target request.
Abstract: Web services have become a new wave of Internet technology development. It is a new solution for dynamic business interactions over the Internet. A critical step in the process of reusing existing Web services is the discovery of potentially relevant components. Current Web services technology only provides syntactic description for Web service. In this paper, we first analyze the limitations for current Web service standards and point out that semantic description is the basis for automatic service discovery. We put forward a new semantics based Web services discovery framework by extending UDDI framework. This framework provides operation level service discovery compared to UDDI architecture. The discovery process is in two stages. If exact match is not met, the match engine will search for suitable services to compose the target request. The composition algorithm is also analyzed. The method of semantic Web service annotation is discussed. We use an ontology based approach to capture real world knowledge for semantic service annotation. The prototype we have implemented gives a whole understanding of the discovery model we have proposed.

Proceedings Article
01 Jan 2006
TL;DR: It is found that the multithreaded implementation in Java scales well with additional processors and is competitive with a C++ implementation using MPI for overall speed of execution.
Abstract: This paper examines the implementation of a multithreaded algorithm for doing collision detection and processing in Java. It examines details of an efficient implementation in Java for single threading, then describes the methods used to implement multithreading. The method described takes advantage of the spatial locality of collisional dynamics while efficiently dealing with the requirements of temporal ordering of collisions. We find that the multithreaded implementation in Java scales well with additional processors and is competitive with a C++ implementation using MPI for overall speed of execution. As such, the multithreaded framework will be advantageous for a number of different problems and analyses that are problematic in a distributed environment.

Proceedings Article
01 Jan 2006
TL;DR: An extension to the concept of vector clocks is presented and examined that is meant to overcome the vector clocks’ great drawback: that the number of processes in the distributed system has to be constant and known in advance.
Abstract: A large number of tasks in distributed systems can be traced down to the fundamental problem of attaining a consistent global view on a distributed computation This problem is commonly solved by appliance of vector clocks as a means of tracing causal dependencies among the events that characterize a run of the computation In the paper at hand an extension to the concept of vector clocks is presented and examined that is meant to overcome the vector clocks’ great drawback: that the number of processes in the distributed system has to be constant and known in advance As an appropriate context for these dynamic vector clocks and their associated algorithms to be integrated into, scalar and vector clocks are analogously reinvestigated

Proceedings Article
01 Jan 2006
TL;DR: A performance guideline, advantages, and disadvantages to the prospective users making the users easily compare technologies, and select one of them.
Abstract: The distributed object technologies are useful in many server situations where the high performance is required. However, the problems that many distributed object technologies such as DCOM, CORBA, and web service exist, but their comparison is insufficient cause a trouble to developers, administrators, practitioners who should choose one of them. This paper presents a performance guideline, advantages, and disadvantages to the prospective users making the users easily compare technologies, and select one of them.

Proceedings Article
01 Jan 2006
TL;DR: Three parallel sorting algorithms have been implemented and compared in terms of their overall execution time and the MPI library has been selected to establish the communication and synchronization between the processors.

Proceedings Article
01 Jan 2006
TL;DR: In this paper, the authors describe the implementation of a web service providing a tuple space service for distributed programming applications, which makes use of REST (Representational State Transfer) for the web service and provides an efficient, useful, easy-to-use mechanism for distributed web-service applications.
Abstract: This paper describes the implementation of a web service providing a tuple space service for distributed programming applications Previous research has established the benefits of using tuple space-based systems, particularly with regard to simplicity This project has developed a new web service providing a tuple space mechanism for distributed applications based on web services The approach that has been adopted makes use of REST (Representational State Transfer) for the web service Initial results of testing the system for a bioinformatics application indicate that the project has provided an efficient, useful, easy-to-use mechanism for distributed web-service applications

Proceedings Article
01 Jan 2006
TL;DR: In this article, a compiler extension for a parallel computer language called Associative Computing (ASC) language to support multiple instruction streams in a MASC model using manager-worker paradigm is described and implemented.
Abstract: In this paper, we describe and implement compiler extension for a parallel computer language called Associative Computing (ASC) language to support multiple instruction streams in a Multiple Associative Computing (MASC) model using manager-worker paradigm. A user directed MASC directive is used to enable concurrent executions of the THEN part and the ELSE part in a parallel IFTHEN-ELSE statement by using two different workerinstruction streams. For most applications, this technique should substantially improves the performance of the system over its performance using only one instruction stream; moreover it is more effective than using multiple instruction streams to execute every parallel IF-THEN-ELSE statements found in a program. When the overhead outweighs the benefit gained from using multiple instruction streams, a user can choose to use only one instruction stream to execute the IF-THEN-ELSE statement. While not explicitly covered here, parallel CASE statements can be handled similarly.

Proceedings Article
01 Jan 2006
TL;DR: A relationship between errors of singular value and the accuracy of singular vector is examined and a suitable parameter choice of the dLVv transformation is discussed by evaluating the orthogonality of singular vectors and the errors of single value decomposi-
Abstract: Let a singular value of a bidiagonal matrix be known. Then the corresponding singular vector can be computed through the twisted factorization of a tridiagonal matrix by the discrete Lotka-Volterra with variable step-size (dLVv) transformation. Errors of the singular value then sensitively affect the conditional number of the tridiagonal matrix. In this paper, we first examine a relationship between errors of singular value and the accuracy of singular vector. Secondly, we discuss a suitable parameter choice of the dLVv transformation by evaluating the orthogonality of singular vectors and the errors of singular value decomposi-

Proceedings Article
01 Jan 2006
TL;DR: A Double Single System Image (Middleware level and Application level) Four Tier Cluster Architecture is presented to provide complete transparency of resource management, scalable performance, and system availability and Parallel Retrieval Virtual Machine (PRVM) data structure is designed and it improves the maintainability and extensibility of the cluster system.
Abstract: The objective of content-based face recognition is to efficiently find and retrieve face images from the database that satisfy the criteria of similarity to the user's query face image. When the database is large and the face image features are complex, the exhaustive search of the database and computation of the face image similarities is not expedient. We use clusters to accelerate the face features matching speed and extend face images storage capacity. In our system, face database is partitioned into small sub-database and they are distributed among the cluster computers like disk RAID0. In this paper, we present a Double Single System Image(Middleware level and Application level) Four Tier Cluster Architecture to provide complete transparency of resource management, scalable performance, and system availability. In addition, Parallel Retrieval Virtual Machine(PRVM) data structure is designed and it improves the maintainability and extensibility of the cluster system. We also propose Multi-process, Multi-thread and Multi-ports(MMM) techniques and synchronized communication mechanism based on TCP/IP Socket to reliably implement parallel retrieval and face recognition between multi-client and multi-server. The experimental results show the cluster face recognition system not only improves the recognition speed, but also extends the data capacity of the system.

Proceedings Article
01 Jan 2006
TL;DR: The properties of JRI of the k-out-of-n system, where the system works if there are at least k functioning components among n identical and independent distributed components, are studied.
Abstract: Joint Reliability Importance (JRI) is a measurement of the degree of interactions between two components in a system. The value of JRI is non-positive (non-negative) if and only if one component becomes more important when the other has failed (is functioning). In this paper, we study the properties of JRI of the k-out-of-n system, where the system works if there are at least k functioning components among n identical and independent distributed components. The variations of JRI according to the probability of the component and to the number of components are analyzed. We also indicate and correct the errors in the previous paper. In addition, the JRI of the series-parallel systems are presented.

Proceedings Article
01 Jan 2006
TL;DR: The article presents and evaluates a new scheduling algorithm that employs co-allocation for space slicing multi-clusters and concludes that this algorithm has potential in finding efficient and scalable scheduling algorithms for smart grids.
Abstract: The article presents and evaluates a new scheduling algorithm that employs co-allocation for space slicing multi-clusters

Proceedings Article
01 Jan 2006
TL;DR: This paper analytically model the process of checkpointing in terms of mean-time-between-failure of the system, amount of memory being checkpointed, sustainable I/O bandwidth to the stable storage, and frequency of checkpoints, and identifies the optimum frequency to be used on systems with given specifications.

Proceedings Article
01 Jan 2006
TL;DR: This paper derives a re-source allocation mechanism, which acts as a platform at the subgame level for the agents to compete, and shows that this mechanism exhibits Nash equilibrium at theSubgame level, which in turn conforms to games, and supergame Nash equilibrium, respectively.
Abstract: This paper proposes a unique replica placement technique using the concepts of a “supergame”. The supergame allows the agents who represent the data objects to continuously compete for the limited available server memory space, so as to acquire the rights to place data objects at the servers. At any given instance in time, the supergame is represented by a game, which is a collection of subgames, played concurrently at each server in the system. We derive a re-source allocation mechanism, which acts as a platform at the subgame level for the agents to compete. This approach allows us to transparently monitor the actions of the agents, who in a non-cooperative environment strategically place the data objects to reduce user access time and latency, which in turn, adds reliability and fault-tolerance to the system. We show that this mechanism exhibits Nash equilibrium at the subgame level, which in turn conforms to games, and supergame Nash equilibrium, respectively. The mechanism is extensively evaluated against some well-known algorithms, such as: greedy, branch and bound, game theoretical auctions and genetic algorithms. The experimental results reveal that the mechanism provides excellent solution quality, while maintaining fast execution time.

Proceedings Article
01 Jan 2006
TL;DR: This framework will deal with a number of important criteria including self-organization, emergence, and others and it is shown that the framework can capture the common underlying structure of complex adaptive systems through demonstrating its application in specific biological domains.
Abstract: In this paper, we provide a general purpose simulation framework for complex adaptive systems. Our framework will deal with a number of important criteria including self-organization, emergence, and others. We then show that the framework can capture the common underlying structure of complex adaptive systems through demonstrating its application in specific biological domains.

Proceedings Article
01 Jan 2006
TL;DR: An MPI Python module, MYMPI, for parallel programming in Python using the Message Passing Interface (MPI), is introduced, and an example, workit, shows that the module can easily be used to create simple parallel pipelines.
Abstract: We introduce an MPI Python module, MYMPI, for parallel programming in Python using the Message Passing Interface (MPI). This is a true Python module which runs with a standard Python interpreter. In this paper we discuss the motivation for creating the MYMPI module, along with differences between MYMPI and pyMPI, another MPI Python interpreter. Additionally, we discuss three projects that have used the MYMPI module: Continuity, in computational biology, and Montage, for astronomical mosaicking. We will show an example, workit, that shows that the module can easily be used to create simple parallel pipelines.


Proceedings Article
01 Jan 2006
TL;DR: Several simulations are carried out to show whether or not GridSIM is a suitable simulation environment for simulating the Distributed Ontology Framework and the similarities between the DOF and the GridSim environment are explored.
Abstract: There are many simulation environments around that can be used to simulate components of Grid Computing. There are also a few simulation environments that allow for data intensive jobs to be simulated. GridSim, with the Data.Grid extension, combines both of these simulation environments to allow simulation of data intensive jobs using Grid resources. GridSim is explored in this paper with an aim to simulate the Distributed Ontology Framework (DOF) in the Semantic Grid Environment. The DOF requires many data resources to be located as well as large scale processing to take place on the collected data. Several simulation environments are discussed and the similarities between the DOF and the GridSim environment are explored. Several simulations are carried out to show whether or not GridSIM is a suitable simulation environment for simulating the Distributed Ontology Framework.

Proceedings Article
01 Jan 2006
TL;DR: The SOR iterative technique is applied to the MIP algorithm and the optimum over-relaxation parameter is provided and the safest domain decomposition method is provided.
Abstract: Domain decomposition methods are widely used to solve the parabolic partial differential equations with Dirichlet, Neumann, or mixed boundary conditions. Modified implicit prediction (MIP) algorithm is unconditionally stable domain decomposition method. In this paper, the SOR iterative technique is applied to the MIP algorithm and the optimum over-relaxation parameter is provided.