scispace - formally typeset
Search or ask a question

Showing papers in "Concurrency and Computation: Practice and Experience in 2007"


Journal ArticleDOI
TL;DR: This paper discusses the design and high‐performance implementation of collective communications operations on distributed‐memory computer architectures and develops implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI.
Abstract: SUMMARY We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentium 4 (R) processor cluster are included.

238 citations


Journal ArticleDOI
TL;DR: A novel testing tool, called MultiRace, which combines improved versions of Djit and Lockset—two very powerful on‐the‐fly algorithms for dynamic detection of apparent data races, and shows that the overheads imposed by MultiRace are often much smaller (orders of magnitude) than those obtained by other existing tools.
Abstract: Data race detection is highly essential for debugging multithreaded programs and assuring their correctness. Nevertheless, there is no single universal technique capable of handling the task efficiently, since the data race detection problem is computationally hard in the general case. Thus, all currently available tools, when applied to some general case program, usually result in excessive false alarms or in a large number of undetected races. Another major drawback of many currently available tools is that they are restricted, for performance reasons, to detection units of fixed size. Thus, they all suffer from the same problem—choosing a small unit might result in missing some of the data races, while choosing a large one might lead to false detection. We present a novel testing tool, called MultiRace, which combines improved versions of Djit and Lockset—two very powerful on‐the‐fly algorithms for dynamic detection of apparent data races. Both extended algorithms detect races in multithreaded programs that may execute on weak consistency systems, and may use two‐way as well as global synchronization primitives. By employing novel technologies, MultiRace adjusts its detection to the native granularity of objects and variables in the program under examination. In order to monitor all accesses to each of the shared locations, MultiRace instruments the C++ source code of the program. It lets the user fine‐tune the detection process, but otherwise is completely automatic and transparent. This paper describes the algorithms employed in MultiRace, gives highlights of its implementation issues, and suggests some optimizations. It shows that the overheads imposed by MultiRace are often much smaller (orders of magnitude) than those obtained by other existing tools. Copyright © 2006 John Wiley & Sons, Ltd.

126 citations



Journal ArticleDOI
TL;DR: These efforts to design and implement such an OpenMP compiler on top of Open64, an open source compiler framework, by extending its existing analysis and optimization and adopting a source‐to‐source translator approach where a native back end is not available are presented.
Abstract: SUMMARY OpenMP has gained wide popularity as an API for parallel programming on shared memory and distributed shared memory platforms. Despite its broad availability, there remains a need for a portable, robust, open source, optimizing OpenMP compiler for C/C++/Fortran 90, especially for teaching and research, for example into its use on new target architectures, such as SMPs with chip multi-threading, as well as learning how to translate for clusters of SMPs. In this paper, we present our efforts to design and implement such an OpenMP compiler on top of Open64, an open source compiler framework, by extending its existing analysis and optimization and adopting a source-to-source translator approach where a native back end is not available. The compilation strategy we have adopted and the corresponding runtime support are described. The OpenMP validation suite is used to determine the correctness of the translation. The compiler’s behavior is evaluated using benchmark tests from the EPCC microbenchmarks and the NAS parallel benchmark. Copyright c � 2007 John Wiley & Sons, Ltd.

102 citations


Journal ArticleDOI
TL;DR: This paper proposes an autonomous SLN formalism to support intelligent applications on large‐scale networks that integrates the SLN logical reasoning with the SLn analogical reasoning and theSLN inductive reasoning, as well as existing techniques to form an autonomous semantic overly.
Abstract: A semantic link network (SLN) consists of nodes (entities, features, concepts, schemas or communities) and semantic links between nodes. This paper proposes an autonomous SLN formalism to support intelligent applications on large-scale networks. The formalism integrates the SLN logical reasoning with the SLN analogical reasoning and the SLN inductive reasoning, as well as existing techniques to form an autonomous semantic overly. The SLN logical reasoning mechanism derives implicit semantic relations by a semantic matrix and relevant addition and multiplication operations based on semantic link rules. The SLN analogical reasoning mechanism proposes conjectures on semantic relations based on structural mapping between nodes. The SLN inductive reasoning mechanism derives general semantics from special semantics. The cooperation of diverse reasoning mechanisms enhances the reasoning ability of each, therefore providing a powerful semantic ability for the semantic overlay. The self-organizing diverse scales of the SLN support the intelligent applications of the Knowledge Grid. Copyright © 2006 John Wiley & Sons, Ltd.

86 citations


Journal ArticleDOI
TL;DR: A road map for combining the research within the different disciplines of testing multi‐threaded programs and for evaluating the quality of this research is outlined, to create a benchmark that can be used to evaluate different solutions.
Abstract: SUMMARY Multi-threaded code is becoming very common, both on the server side, and very recently for personal computers as well. Consequently, looking for intermittent bugs is a problem that is receiving more and more attention. As there is no silver bullet, research focuses on a variety of partial solutions. We outline a road map for combining the research within the different disciplines of testing multi-threaded programs and on evaluating the quality of this research. We have three main goals. First, to create a benchmark that can be used to evaluate different solutions. Second, to create a framework with open APIs that enables the combination of techniques in the multi-threading domain. Third, to create a focus for the research in this area around which a community of people who try to solve similar problems with different techniques can congregate. We have started creating such a benchmark and describe the lessons learned in the process. The framework will enable technology developers, for example, developers of race detection algorithms, to concentrate on their components and use other ready made components, (e.g., an instrumentor) to create a testing solution.

78 citations


Journal ArticleDOI
TL;DR: This paper describes in detail the implementation of code to solve linear system of equations using Gaussian elimination in single precision with iterative refinement of the solution to the full double‐precision accuracy.
Abstract: SUMMARY Thispaper describesthedesign conceptsbehindimplementationsof mixed-precision linear algebra routines targeted for the Cell processor. It describes in detail the implementation of code to solve linear system of equations using Gaussian elimination in single precision with iterative refinement of the solution to the full double-precision accuracy. By utilizing this approach the algorithm achieves close to an order of magnitude higher performance on the Cell processor than the performance offered by the standard double-precision algorithm. The code is effectively an implementation of the high-performance LINPACK benchmark, as it meets all of the requirements concerning the problem being solved and the numerical properties of the solution. Copyright c � 2007 John Wiley & Sons, Ltd.

77 citations


Journal ArticleDOI
TL;DR: This paper divides conventional inconsistency into weak consistency, weak inconsistency and strong inconsistency and treats conventional consistency as strong consistency and develops some algorithms on how to verify them.
Abstract: SUMMARY To verify fixed-time constraints in Grid workflow systems, consistency and inconsistency conditions have been defined in conventional verification work. However, with a view of the run-time uncertainty of activity completion duration, we argue that, although the conventional consistency condition is feasible, the conventional inconsistency condition is too restrictive and covers several different states. These states, which are handled conventionally by the same exception handling, should be handled differently for the purpose of cost saving. Therefore, in this paper, we divide conventional inconsistency into weak consistency, weak inconsistency and strong inconsistency and treat conventional consistency as strong consistency. Correspondingly, we develop some algorithms on how to verify them. Based on this, for weak consistency we present a method on how to adjust it to strong consistency by using mean activity time redundancy and temporal dependency between fixed-time constraints. For weak inconsistency, we analyse briefly why it can be handled by simpler and more cost-saving exception handling while for strong inconsistency, the conventional exception handling remains deployed. The final quantitative evaluation demonstrates that our research can achieve better cost-effectiveness than the conventional work. Copyright c

74 citations


Journal ArticleDOI
TL;DR: This paper discusses the support for standards-based Grid portlets using the Velocity development environment, and describes advance information, semantic data, collaboration, and science application services developed by the Open Grid Computing Environments collaboration.
Abstract: We review the efforts of the Open Grid Computing Environments collaboration. By adopting a general three-tiered architecture based on common standards for portlets and Grid Web services, we can deliver numerous capabilities to science gateways from our diverse constituent efforts. In this paper, we discuss our support for standards-based Grid portlets using the Velocity development environment. Our Grid portlets are based on abstraction layers provided by the Java CoG kit, which hide the differences of different Grid toolkits. Sophisticated services are decoupled from the portal container using Web service strategies. We describe advance information, semantic data, collaboration, and science application services developed by our consortium. Copyright © 2006 John Wiley & Sons, Ltd.

68 citations


Journal IssueDOI
TL;DR: This paper discusses the design and high-performance implementation of collective communications operations on distributed-memory computer architectures and develops implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI.
Abstract: We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentium 4 (R) processor cluster are included. Copyright © 2007 John Wiley & Sons, Ltd.

65 citations


Journal ArticleDOI
TL;DR: This paper sparsely sample performance data on two radically different platforms across large, multidimensional parameter spaces and shows that models based on these data can predict performance within 2% to 7% of actual application runtimes.
Abstract: National Science Foundation Grant Number CCF-0444413; United States Department of Energy Grant Number W-7405-Eng-48

Journal ArticleDOI
TL;DR: Preliminary experiments show that the computed reputations based on the proposed reputation model can reflect the actual reputations of the simulated roles and therefore can fit in well with the user‐interactive question answering system.
Abstract: In this paper, we propose a user reputation model and apply it to a user‐interactive question answering system. It combines the social network analysis approach and the user rating approach. Social network analysis is applied to analyze the impact of participant users' relations to their reputations. User rating is used to acquire direct judgment of a user's reputation based on other users' experiences with this user. Preliminary experiments show that the computed reputations based on our proposed reputation model can reflect the actual reputations of the simulated roles and therefore can fit in well with our user‐interactive question answering system. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper proposes a proactive method to build a semantic overlay, based on an epidemic protocol that clusters peers with similar content, that is, without requiring the user to specify preferences or to characterize the content of files being shared.
Abstract: Much research on content-based P2P searching for file-sharing applications has focused on exploiting semantic relations between peers to facilitate searching. Current methods suggest reactive ways to manage semantic relations: they rely on the usage of the underlying search mechanism, and infer semantic relationships based on the queries placed and the corresponding replies received. In this paper we follow a different approach, proposing a proactive method to build a semantic overlay. Our method is based on an epidemic protocol that clusters peers with similar content. Peer clustering is done in a completely implicit way, that is, without requiring the user to specify preferences or to characterize the content of files being shared. In our approach, each node maintains a small list of semantically optimal peers. Our simulation studies show that such a list is highly effective when searching files. The construction of this list through gossiping is efficient and robust, even in the presence of changes in the network. Copyright © 2007 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: JaMP is an adaptation of the OpenMP standard to Java that implements a large subset of theopenMP specification with an expressiveness comparable to that of OpenMP, and a set of extensions that allow a better integration of Open MP into the Java language.
Abstract: Although OpenMP is a widely agreed-upon standard for the C/C++ and Fortran programming languages for the semi-automatic parallelization of programs for shared memory machines, not much has been done on the binding of OpenMP to Java that targets clusters with distributed memory. This paper presents three major contributions: (1) JaMP is an adaptation of the OpenMP standard to Java that implements a large subset of the OpenMP specification with an expressiveness comparable to that of OpenMP; (2) we suggest a set of extensions that allow a better integration of OpenMP into the Java language; (3) we present our prototype implementation of JaMP in the research compiler Jackal, a software-based distributed shared memory implementation for Java. We evaluate the performance of JaMP with a set of micro-benchmarks and with OpenMP versions of the parallel Java Grande Forum (JGF) benchmarks. The micro-benchmarks show that OpenMP for Java can be implemented without much overhead. The JGF benchmarks achieve a good speed-up of 5–8 on eight nodes. Copyright © 2007 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: An overview of the LEAD project and the role that LEAD portal is playing in reaching its goals is given, and the various technologies used to bring powerful and complex scientific tools to educational and research users are described.
Abstract: The Linked Environments for Atmospheric Discovery (LEAD) Portal is a science application portal designed to enable effective use of Grid resources in exploring mesoscale meteorological phenomena. The aim of the LEAD Portal is to provide a more productive interface for doing experimental work by the meteorological research community, as well as bringing weather research to a wider class of users, meaning pre-college students in grades 6–12 and undergraduate college students. In this paper, we give an overview of the LEAD project and the role that LEAD portal is playing in reaching its goals. We then describe the various technologies we are using to bring powerful and complex scientific tools to educational and research users. These technologies—a fine-grained capability based authorization framework, an application service factory toolkit, and a Web services-based workflow execution engine and supporting tools—enable our team to deploy these once inaccessible, stovepipe scientific codes onto a Grid where they can be collectively utilized. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The implementation and application of iPath, an incremental call‐path profiler, was used to profile two real‐world applications: the MILC su3_rmd QCD distributed simulation and the Paradyn instrumentation daemon.
Abstract: Profiling is a key technique for achieving high performance. Call-path profiling is a refinement of this technique that classifies a function's behavior based on the path taken to reach the function. This information is particularly useful when optimizing programs that use libraries, such as those for communication (MPI or PVM), linear algebra (ScaLAPACK), or threading. We present a new method for call-path profiling called incremental call-path profiling. We profile only a subset of the functions in the program, allowing the use of more complex metrics while lowering the overhead. This combination of call-path information and complex metrics is particularly useful for localizing bottlenecks in frequently called functions. We also describe the implementation and application of iPath, an incremental call-path profiler. iPath was used to profile two real-world applications: the MILC su3_rmd QCD distributed simulation and the Paradyn instrumentation daemon. In both applications we found and removed call-path-specific bottlenecks. Our modifications to su3_rmd reduced the running time of the program from 3001 to 1652 s, a 45% decrease. Our modifications to the Paradyn instrumentation daemon greatly increased its efficiency. The time required to instrument our benchmark program was reduced from 296 to 6.4 s, a 98% decrease. Copyright © 2006 John Wiley & Sons, Ltd.

Journal IssueDOI
TL;DR: A novel testing tool, called MultiRace, is presented, which combines improved versions of Djit and Lockset—two very powerful on-the-fly algorithms for dynamic detection of apparent data races and shows that the overheads imposed by MultiRace are often much smaller than those obtained by other existing tools.
Abstract: Data race detection is highly essential for debugging multithreaded programs and assuring their correctness. Nevertheless, there is no single universal technique capable of handling the task efficiently, since the data race detection problem is computationally hard in the general case. Thus, all currently available tools, when applied to some general case program, usually result in excessive false alarms or in a large number of undetected races. Another major drawback of many currently available tools is that they are restricted, for performance reasons, to detection units of fixed size. Thus, they all suffer from the same problem—choosing a small unit might result in missing some of the data races, while choosing a large one might lead to false detection. We present a novel testing tool, called MultiRace, which combines improved versions of Djit and Lockset—two very powerful on-the-fly algorithms for dynamic detection of apparent data races. Both extended algorithms detect races in multithreaded programs that may execute on weak consistency systems, and may use two-way as well as global synchronization primitives. By employing novel technologies, MultiRace adjusts its detection to the native granularity of objects and variables in the program under examination. In order to monitor all accesses to each of the shared locations, MultiRace instruments the C++ source code of the program. It lets the user fine-tune the detection process, but otherwise is completely automatic and transparent. This paper describes the algorithms employed in MultiRace, gives highlights of its implementation issues, and suggests some optimizations. It shows that the overheads imposed by MultiRace are often much smaller (orders of magnitude) than those obtained by other existing tools. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The results indicate that the proposed MRMOGA (Multiple Resolution Multi‐Objective Genetic Algorithm) is a viable alternative to solve multi‐objective optimization problems in parallel, particularly when dealing with large search spaces.
Abstract: In this paper, we introduce MRMOGA (Multiple Resolution Multi‐Objective Genetic Algorithm), a new parallel multi‐objective evolutionary algorithm which is based on an injection island approach. This approach is characterized by adopting an encoding of solutions which uses a different resolution for each island. This approach allows us to divide the decision variable space into well‐defined overlapped regions to achieve an efficient use of multiple processors. Also, this approach guarantees that the processors only generate solutions within their assigned region. In order to assess the performance of our proposed approach, we compare it to a parallel version of an algorithm that is representative of the state‐of‐the‐art in the area, using standard test functions and performance measures reported in the specialized literature. Our results indicate that our proposed approach is a viable alternative to solve multi‐objective optimization problems in parallel, particularly when dealing with large search spaces. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A security model for community accounts is proposed, organized by the four As of security: Authentication, Authorization, Auditing and Accounting.
Abstract: Science gateways have emerged as a concept for allowing large numbers of users in communities to easily access high-performance computing resources which previously required a steep learning curve to utilize. In order to reduce the complexity of managing access for these communities, which can often be large and dynamic, the concept of community accounts is being considered. This paper proposes a security model for community accounts, organized by the four As of security: Authentication, Authorization, Auditing and Accounting. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: Within the $^{my}$Grid project, key resources that can be shared including complete workflows, fragments of workflows and constituent services are identified and a unified descriptive model to support their later discovery is developed.
Abstract: Scientific workflows are becoming a valuable tool for scientists to capture and automate e-Science procedures. Their success brings the opportunity to publish, share, reuse and re-purpose this explicitly captured knowledge. Within the (my)Grid project, we have identified key resources that can be shared including complete workflows, fragments of workflows and constituent services. We have examined the alternative ways that these resources can be described by their authors (and subsequent users) and developed a unified descriptive model to support their later discovery. By basing this model on existing standards, we have been able to extend existing Web service and Semantic Web service infrastructure whilst still supporting the specific needs of the e-Scientist. The (my)Grid components enable a workflow lifecycle that extends beyond execution to include the discovery of previous relevant designs, the reuse of those designs and their subsequent publication. Experience with example groups of scientists indicates that this cycle is valuable. The growing number of workflows and services mean more work is needed to support the user in effective ranking of search results and to support the re-purposing process. Copyright (c) 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A method is developed that generates space‐filling curves quickly in parallel by reducing the generation to integer sorting, which will hinder scalability at tens and hundreds of thousands of processors.
Abstract: In this paper we consider the scalability of parallel space-filling curve generation as implemented through parallel sorting algorithms. Multiple sorting algorithms are studied and results show that space-filling curves can be generated quickly in parallel on thousands of processors. In addition, performance models are presented that are consistent with measured performance and offer insight into performance on still larger numbers of processors. At large numbers of processors, the scalability of adaptive mesh refined codes depends on the individual components of the adaptive solver. One such component is the dynamic load balancer. In adaptive mesh refined codes, the mesh is constantly changing resulting in load imbalance among the processors requiring a load-balancing phase. The load balancing may occur often, requiring the load balancer to perform quickly. One common method for dynamic load balancing is to use space-filling curves. Space-filling curves, in particular the Hilbert curve, generate good partitions quickly in serial. However, at tens and hundreds of thousands of processors serial generation of space-filling curves will hinder scalability. In order to avoid this issue we have developed a method that generates space-filling curves quickly in parallel by reducing the generation to integer sorting. Copyright © 2007 John Wiley & Sons, Ltd.


Journal ArticleDOI
TL;DR: A method for verifying concurrent Java components that includes ConAn and complements it with other static and dynamic verification tools and techniques is proposed, based on an analysis of common concurrency problems and concurrency failures in Java components.
Abstract: The Java programming language supports concurrency. Concurrent programs are harder to verify than their sequential counterparts due to their inherent non-determinism and a number of specific concurrency problems, such as interference and deadlock. In previous work, we have developed the ConAn testing tool for the testing of concurrent Java components. ConAn has been found to be effective at testing a large number of components, but there are certain classes of failures that are hard to detect using ConAn. Although a variety of other verification tools and techniques have been proposed for the verification of concurrent software, they each have their strengths and weaknesses. In this paper, we propose a method for verifying concurrent Java components that includes ConAn and complements it with other static and dynamic verification tools and techniques. The proposal is based on an analysis of common concurrency problems and concurrency failures in Java components. As a starting point for determining the concurrency failures in Java components, a Petri-net model of Java concurrency is used. By systematically analysing the model, we come up with a complete classification of concurrency failures. The classification and analysis are then used to determine suitable tools and techniques for detecting each of the failures. Finally, we propose to combine these tools and techniques into a method for verifying concurrent Java components. Copyright (c) 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper presents an overview of the key challenges that need to be addressed for the integration of benchmarking practices, techniques, and tools in emerging Grid computing infrastructures and proposes the use of ontologies for organizing and describing benchmarking metrics.
Abstract: Grid benchmarking is an important and challenging topic of Grid computing research. In this paper, we present an overview of the key challenges that need to be addressed for the integration of benchmarking practices, techniques, and tools in emerging Grid computing infrastructures. We discuss the problems of performance representation, measurement, and interpretation in the context of Grid benchmarking, and propose the use of ontologies for organizing and describing benchmarking metrics. Finally, we present a survey of ongoing research efforts that develop benchmarks and benchmarking tools for the Grid. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper presents an environment called MATE (Monitoring, Analysis and Tuning Environment) that has been developed to provide dynamic tuning of parallel/distributed applications and shows practical experiments conducted with MATE to prove its effectiveness and profitability.
Abstract: The main goal of parallel/distributed applications is to solve the considered problem as fast as possible using the available resources. In this context, the application performance becomes a crucial issue. Developers of these applications must optimize them if they are to fulfill the promise of high‐performance computation. To improve performance, developers search for bottlenecks by analyzing application behavior, try to identify performance problems, determine their causes and overcome them by changing the source code of the application. Current approaches require developers to do these tasks manually and imply a high degree of expertise. Therefore, another approach is needed to help developers during the optimization process. This paper presents the dynamic tuning approach that addresses these issues. In this approach, many tasks are automated and the user intervention and required experience may be significantly reduced. An application is monitored, its performance bottlenecks are detected and it is modified automatically during execution, without recompiling or re‐running it. The introduced modifications adapt the application behavior to changing conditions. We present an environment called MATE (Monitoring, Analysis and Tuning Environment) that has been developed to provide dynamic tuning of parallel/distributed applications. We also show practical experiments conducted with MATE to prove its effectiveness and profitability. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: To prototype the collaboration and visualization infrastructure of VLab, a Grid and Web‐Service‐based system for enabling distributed and collaborative computational chemistry and material science applications for the study of planetary materials, a service is developed that transforms a scalar data set into its wavelet representation.
Abstract: SUMMARY We present the initial architecture and implementation of VLab, a Grid and Web-Service-based system for enabling distributed and collaborative computational chemistry and material science applications for the study of planetary materials. The requirements of VLab include job preparation and submission, job monitoring, data storage and analysis, and distributed collaboration. These components are divided into client entry (input file creation, visualization of data, task requests) and back-end services (storage, analysis, computation). Clients and services communicate through NaradaBrokering, a publish/subscribe Grid middleware system that identifies specific hardware information with topics rather than IP addresses. We describe three aspects of VLab in this paper: (1) managing user interfaces and input data with JavaBeans and Java Server Faces; (2) integrating Java Server Faces with the Java CoG Kit; and (3) designing a middleware framework that supports collaboration. To prototype our collaboration and visualization infrastructure, we have developed a service that transforms a scalar data set into its wavelet representation. General adaptors are placed between the endpoints and NaradaBrokering, which serve to isolate the clients/services from the middleware. This permits client and service development independently of potential changes to the middleware. Copyright c " 2007 John Wiley & Sons, Ltd.

Journal IssueDOI
TL;DR: A method is developed that generates space-filling curves quickly in parallel by reducing the generation to integer sorting, which is consistent with measured performance and offers insight into performance on still larger numbers of processors.
Abstract: In this paper we consider the scalability of parallel space-filling curve generation as implemented through parallel sorting algorithms. Multiple sorting algorithms are studied and results show that space-filling curves can be generated quickly in parallel on thousands of processors. In addition, performance models are presented that are consistent with measured performance and offer insight into performance on still larger numbers of processors. At large numbers of processors, the scalability of adaptive mesh refined codes depends on the individual components of the adaptive solver. One such component is the dynamic load balancer. In adaptive mesh refined codes, the mesh is constantly changing resulting in load imbalance among the processors requiring a load-balancing phase. The load balancing may occur often, requiring the load balancer to perform quickly. One common method for dynamic load balancing is to use space-filling curves. Space-filling curves, in particular the Hilbert curve, generate good partitions quickly in serial. However, at tens and hundreds of thousands of processors serial generation of space-filling curves will hinder scalability. In order to avoid this issue we have developed a method that generates space-filling curves quickly in parallel by reducing the generation to integer sorting. Copyright © 2007 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The Grid Resource Broker (GRB), a Grid portal built leveraging a set of high‐level, Globus‐Toolkit‐based Grid libraries called GRB libraries, leverages the Liferay framework to provide users with an intuitive, highly customizable Web GUI.
Abstract: This paper describes the Grid Resource Broker (GRB), a Grid portal built leveraging a set of high‐level, Globus‐Toolkit‐based Grid libraries called GRB libraries. The portal leverages the Liferay framework to provide users with an intuitive, highly customizable Web GUI. The underlying GRB middleware allows trusted users seamless access to their computational Grid environments. Copyright © 2007 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The focus of this paper is the automation of the Probabilistic Miss Equations (PME) model, an analytical model of the cache behavior that provides fast and accurate predictions for codes with irregular access patterns and its integration in the XARK compiler is described.
Abstract: SUMMARY The memory hierarchy plays an essential role in the performance of current computers, thus good analysis tools that help predict and understand its behavior are required. Analytical modeling is the ideal base for such tools if its traditional limitations in accuracy and scope of application are overcome. While there has been extensive research on the modeling of codes with regular access patterns, less attention has been paid to codes with irregular patterns due to the increased difficulty to analyze them. Nevertheless, many important applications exhibit this kind of patterns, and their lack of locality make them more cache-demanding, which makes their study more relevant. The focus of this paper is the automation of the Probabilistic Miss Equations (PME) model, an analytical model of the cache behavior that provides fast and accurate predictions for codes with irregular access patterns. The paper defines the information requirements of the PME model and describes its integration in the XARK compiler, a research compiler oriented to automatic kernel recognition in scientific codes. We show how to exploit the powerful information-gathering capabilities provided by this compiler to allow automated modeling of loop-oriented scientific codes. Experimental results that validate the correctness of the automated PME model are also presented.

Journal ArticleDOI
TL;DR: The notion of knowledge energy as the driving force behind the formation of an autonomous knowledge flow network is proposed, and the underlying principles are explored.
Abstract: A knowledge flow is invisible but it plays an important role in ordering knowledge exchange when working in a team. It can help achieve effective team knowledge management by modeling, optimizing, monitoring, and controlling the operation of knowledge flow processes. This paper proposes the notion of knowledge energy as the driving force behind the formation of an autonomous knowledge flow network, and explores the underlying principles. Knowing these principles helps teams and the support systems improve cooperation by monitoring the knowledge energy of nodes, by evaluating and adjusting knowledge flows, and by adopting appropriate strategies. A knowledge flow network management mechanism can help improve the efficiency of knowledge-intensive distributed teamwork. Copyright c © 2006 JohnWiley & Sons, Ltd.