scispace - formally typeset
Search or ask a question

Showing papers in "Concurrency and Computation: Practice and Experience in 2001"


Journal ArticleDOI
TL;DR: Developing advanced applications for the emerging national‐scale ‘Computational Grid’ infrastructures is still a difficult task because these services may not be compatible with the commodity distributed‐computing technologies and frameworks used previously.
Abstract: In this paper we report on the features of the Java Commodity Grid Kit. The Java CoG Kit provides middleware for accessing Grid functionality from the Java framework. Java CoG Kit middleware is general enough to design a variety of advanced Grid applications with quite different user requirements. Access to the Grid is established via Globus protocols, allowing the Java CoG Kit to communicate also with the C Globus reference implementation. Thus, the Java CoG Kit provides Grid developers with the ability to utilize the Grid, as well as numerous additional libraries and frameworks developed by the Java community to enable network, Internet, enterprise, and peer-to peer computing. A variety of projects have successfully used the client libraries of the Java CoG Kit to access Grids driven by the C Globus software. In this paper we also report on the efforts to develop server side Java CoG Kit components. As part of this research we have implemented a prototype pure Java resource management system that enables one to run Globus jobs on platforms on which a Java virtual machine is supported, including Windows NT machines.

386 citations


Journal ArticleDOI
TL;DR: A Hoare-style calculus for a substantial subset of Java Card, which is called Java, which includes side-effecting expressions, mutual recursion, dynamic method binding, full exception handling, and static class initialization.
Abstract: SUMMARY This article presents a Hoare-style calculus for a substantial subset of Java Card, which we call Java . In particular, the language includes side-effecting expressions, mutual recursion, dynamic method binding, full exception handling, and static class initialization. The Hoare logic of partial correctness is proved not only sound (w.r.t. our operational semantics of Java , described in detail elsewhere) but even complete. It is the first logic for an object-oriented language that is provably complete. The completeness proof uses a refinement of the Most General Formula approach. The proof of soundness gives new insights into the role of type safety. Further by-products of this work are a new general methodology for handling side-effecting expressions and their results, the discovery of the strongest possible rule of consequence, and a flexible Call rule for mutual recursion. We also give a small but non-trivial application example. All definitions and proofs have been done formally with the interactive theorem prover Isabelle/HOL. This guarantees not only rigorous definitions, but also gives maximal confidence in the results obtained.

102 citations


Journal ArticleDOI
TL;DR: This special issue is one devoted to selected papers from the ACM 2000 Java Grande Conference, held in San Francisco on June 3–4, 2000, to provide feedback to users and language developers on what is required to successfully deploy Java in a broad range of scientific and high-performance network computing systems.
Abstract: This special issue is one devoted to selected papers from the ACM 2000 Java Grande Conference. It was held in San Francisco on June 3–4, 2000. All the papers have been revised and re-refereed to ensure appropriate journal quality. This was the fifth of a series of meetings, exploring the use of the Java programming language for scientific and engineering computing and high-performance network computing—a range of applications that has been denoted with the epithet ‘Grande’. The previous Java Grande workshops were held very successfully in Syracuse in 1996, Las Vegas in 1997, Palo Alto in 1998 and San Francisco in 1999. The proceedings were also published in special issues of Concurrency: Practice and Experience (volume 9, issues 6 and 11; volume 10, issues 11–13; volume 12 issues 6–8). The Java Grande conference focuses on the use of Java in the broad area of high-performance computing, including engineering and scientific applications, simulations, data-intensive applications, and other emerging application areas that exploit parallel and distributed computing or combine communication and computing. We believe that Java will play an increasingly important role in these areas. The goal is to provide feedback to users and language developers on what is required to successfully deploy Java in a broad range of scientific and high-performance network computing systems. Topics covered by Java Grande include:

88 citations


Journal ArticleDOI
TL;DR: This paper overviews the data integration system AMOS II based on the wrapper‐mediator approach, which consists of a mediator database engine that can process and execute queries over data stored locally and in several external data sources, and object‐oriented multi‐database views for reconciliation of data and schema heterogeneities among sources with various capabilities.
Abstract: Integration of data from autonomous, distributed and heterogeneous data sources poses several technical challenges. This paper overviews the data integration system AMOS II based on the wrapper-mediator approach. AMOS II consists of: (i) a mediator database engine that can process and execute queries over data stored locally and in several external data sources, and (ii) object-oriented (OO) multi-database views for reconciliation of data and schema heterogeneities among sources with various capabilities. The data stored in different types of data sources is translated and integrated using OO mediation primitives, providing the user with a consistent view of the data in all the sources. Through its multi-database facilities many distributed AMOS II systems can interoperate in a federation. Since most data reside in the data sources, and to achieve high performance, the core of the system is a main-memory DBMS having a storage manager, query optimizer, transactions, client–server interface, disk backup, etc. The AMOS II data manager is optimized for main-memory access and is extensible so that new data types and query operators can be added or implemented in some external programming language. The extensibility is essential for providing seamlessaccess to a variety of data sources. Copyright © 2001 John Wiley & Sons, Ltd.

64 citations


Journal ArticleDOI
TL;DR: A variant of Eva and Kristoffer Rose's proposal of an annotation of Java Virtual Machine code with types to enable a one‐pass verification of well‐typedness is formalized in the theorem prover Isabelle/HOL and soundness and completeness are proved.
Abstract: SUMMARY Eva and Kristoffer Rose proposed a (sparse) annotation of Java Virtual Machine code with types to enable a one-pass verification of welltypedness. We have formalized a variant of their proposal in the theorem prover Isabelle/HOL and proved soundness and completeness.

43 citations


Journal ArticleDOI
TL;DR: By following the paradigm introduced in 15, it is demonstrated that this domain decomposition solver may be coupled easily with a conventional mesh refinement code, thus allowing the accuracy, reliability and efficiency of mesh adaptivity to be utilized in a well load‐balanced manner.
Abstract: We present a new domain decomposition algorithm for the parallel finite element solution of elliptic partial differential equations. As with most parallel domain decomposition methods each processor is assigned one or more subdomains and an iteration is devised which allows the processors to solve their own subproblem(s) concurrently. The novel feature of this algorithm however is that each of these subproblems is defined over the entire domain - although the vast majority of the degrees of freedom for each subproblem are associated with a single subdomain (owned by the corresponding processor). This ensures that a global mechanism is contained within each of the subproblems tackled and so no separate coarse grid solve is required in order to achieve rapid convergence of the overall iteration. Furthermore, by following the paradigm introduced in [15], it is demonstrated that this domain decomposition solver may be coupled easily with a conventional mesh refinement code, thus allowing the accuracy, reliability and efficiency of mesh adaptivity to be utilized in a well load-balanced manner. Finally, numerical evidence is presented which suggests that this technique has significant potential, both in terms of the rapid convergence properties and the efficiency of the parallel implementation.

42 citations


Journal ArticleDOI
TL;DR: The design, implementation, and deployment of the DISCOVER Web‐based computational collaboratory is presented, which is to bring large distributed simulations to the scientists'/engineers' desktop by providing collaborative Web-based portals for monitoring, interaction and control.
Abstract: SUMMARY This paper presents the design, implementation, and deployment of the DISCOVER web-based computational collaboratory. Its primary goal is to bring large distributed simulations to the scientists’/engineers’ desktop by providing collaborative web-based portals for monitoring, interaction and control. DISCOVER supports a 3-tier architecture composed of detachable thin-clients at the frontend, a network of interactions servers in the middle, and a control network of sensors, actuators, and interaction agents at the back-end. The interaction servers enable clients to connect and collaboratively interact with registered applications using a browser. The application control network enables sensors and actuators to be encapsulated within, and directly deployed with the computational objects. The application interaction gateway manages overall interaction. It uses Java Native Interface to create Java proxies that mirror computational objects and allow them to be directly accessed at the interaction server. Security and authentication are provided using customizable access control lists and SSL-based secure servers.

40 citations


Journal ArticleDOI
TL;DR: This work presents the feature model as an extension of Java and gives two translations to Java, one via inheritance and the other via aggregation, and shows that it interacts nicely with several common language extensions such as type parameters, exceptions, and higher‐order functions.
Abstract: We propose a new model for flexible composition of objects from a set of features. Features are services of an object and are similar to classes in object-oriented languages. In many cases, features have to be adapted in the presence of other features, which is also called the feature interaction problem. We introduce explicit interaction handlers which can adapt features to other features by overriding methods. When features are composed, the appropriate interaction handling is added in a way which generalizes inheritance and aggregation. For a set of features, an exponential number of different feature combinations is possible, based on a quadratic number of interaction resolutions. We present the feature model as an extension of Java and give two translations to Java, one via inheritance and the other via aggregation. We show that the feature model interacts nicely with several common language extensions such as type parameters, exceptions, and higher-order functions. Copyright © 2001 John Wiley & Sons, Ltd.

39 citations


Journal ArticleDOI
TL;DR: The aim of the paper is to show how the Petri net analysis technique can be used for deciding whether to use traditional client/server, remote evaluation or mobile agents paradigm in designing a particular evaluation.
Abstract: In this paper we deal with the study of the actual convenience of using the agent programming paradigm for accesssing distributed service. We try to point out the benefits of such a communication paradigm, by providing an analytical study of its basic features in comparison with the client–server approach and remote evaluation. The aim of the paper is to show how the Petri net analysis technique can be used for deciding whether to use traditional client/server, remote evaluation or mobile agents paradigm in designing a particular evaluation. So, we present several models of non-Markovian Petri nets, which have been solved through the WebSPN tool, and we provide a close comparison between the agents technique, the client-server and the remote evaluation communication paradigm. The results that we have obtained show how agents must not always be considered the only solution to any communication issue, since in several cases their use might even reveal a drawback. We also focus out attention on providing some practical remarks, which can help the developer during the design in order to select the communication paradigm which best suits the features of the application that has to be developed. Copyright © 2001 John Wiley & Sons, Ltd.

33 citations


Journal ArticleDOI
TL;DR: This paper implements efficient matrix multiplication for large matrices using the Intel Pentium single instruction multiple data (SIMD) floating point architecture and gives a detailed description of the register allocation, Level 1 and Level 2 cache blocking strategies that yield the best performance for the Pentium III family.
Abstract: Generalized matrix–matrix multiplication forms the kernel of many mathematical algorithms, hence a faster matrix–matrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices using the Intel Pentium single instruction multiple data (SIMD) floating point architecture. The main difficulty with the Pentium and other commodity processors is the need to efficiently utilize the cache hierarchy, particularly given the growing gap between main-memory and CPU clock speeds. We give a detailed description of the register allocation, Level 1 and Level 2 cache blocking strategies that yield the best performance for the Pentium III family. Our results demonstrate an average performance of 2.09 times faster than the leading public domain matrix–matrix multiply routines and comparable performance with Intel's SIMD small matrix–matrix multiply routines. Copyright © 2001 John Wiley & Sons, Ltd.

31 citations


Journal ArticleDOI
TL;DR: This paper presents a model of fault‐tolerant holonic manufacturing systems (HMS) where each holon's activities are controlled by an intelligent software agent, and proposes how the IEC 1499 standard for distributed control systems could be used to implement this model.
Abstract: This paper presents a model of fault-tolerant holonic manufacturing systems (HMS) where each holon's activities are controlled by an intelligent software agent. Multiple agents schedule actions, resolve conflicts and manage information to produce, transport, assemble, inspect and store customized products. Our model provides robustness and distribution transparency across a shop-floor where unpredictable failures occur with machines, control software and communication networks. Each autonomous holon is composed of a hierarchy of large-grain functional components where interaction is carried out by user-defined cooperation strategies. These strategies enable holons to coordinate their behaviour through exchanging messages and sensing/actuating of their shared environment. Therefore, holonic agents can select suitable rescheduling and recovery mechanisms to tolerate faults and keep the manufacturing system working. We also propose how the IEC 1499 standard (Function Block Architecture) for distributed control systems could be used to implement our model. The model presented here is a crystallization of some abstract concepts from a generic cooperating agent system, with suitable extensions to meet the criteria of the ongoing HMS project. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper has developed and implemented an algorithm supporting one of the transitions from analysis to design, the transformation of scenario models into behavior models, and supports the Unified Modelling Language (UML), mapping the UML's collaboration diagrams into state transition diagrams.
Abstract: Current methods for object-oriented software development provide notation for the specification of models, yet do not sufficiently relate the different model types to each other, nor do they provide support for transformations from one model type to another. This makes transformations a manual activity, which increases the risk of inconsistencies among models and may lead to a loss of information. We have developed and implemented an algorithm supporting one of the transitions from analysis to design, the transformation of scenario models into behavior models. This algorithm supports the Unified Modelling Language (UML), mapping the UML's collaboration diagrams into state transition diagrams. We believe that CASE tools implementing such algorithms will be highly beneficial in object-oriented software development. In this paper, we provide an overview of our algorithm and discuss all its major steps. The algorithm is detailed in semi-formal English and illustrated with a number of examples. Furthermore, the algorithm is assessed from different perspectives, such as scope and role in the overall development process, issues in the design of the algorithm, complexity, implementation and experimentation, and related work. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A new approach to object replication in Java that allows the programmer to define groups of objects that can be replicated and updated as a whole, using reliable, totally‐ordered broadcast to send update methods to all machines containing a copy is described and evaluated.
Abstract: We describe and evaluate a new approach to object replication in Java, aimed at improving the performance of parallel programs. Our programming model allows the programmer to define groups of objects that can be replicated and updated as a whole, using reliable, totally-ordered broadcast to send update methods to all machines containing a copy. The model has been implemented in the Manta highperformance Java system. We evaluate system performance both with microbenchmarks and with a set of five parallel applications. For the applications, we also evaluate ease of programming, compared to RMI implementations. We present performance results for a Myrinet-based workstation cluster as well as for a wide-area distributed system consisting of four such clusters. The microbenchmarks show that updating a replicated object on 64 machines only takes about three times the RMI latency in Manta. Applications using Manta's object replication mechanism perform at least as fast as manually optimized versions based on RMI, while keeping the application code as simple as with naive versions that use shared objects without taking locality into account. Using a replication mechanism in Manta's runtime system enables several unmodified applications to run efficiently even on the wide-area system. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: To be an effective platform for performance‐sensitive real‐time systems, commodity‐off‐the‐shelf COTS distributed object computing (DOC) middleware must support application quality of service (QoS) requirements end‐to‐end, which makes conventional COTS DOC middleware unsuited for applications with stringent latency, determinism, and priority preservation requirements.
Abstract: To be an effective platform for performance-sensitive real-time systems, commodity-off-the-shelf (COTS) distributed object computing (DOC) middleware must support application quality of service (QoS) requirements end-to-end. However, conventional COTS DOC middleware does not provide this support, which makes it unsuited for applications with stringent latency, determinism, and priority preservation requirements. It is essential, therefore, to develop standards-based, COTS DOC middleware that permits the specification, allocation, and enforcement of application QoS requirements end-to-end. The real-time CORBA and messaging specifications in the CORBA 2.4 standard are important steps towards defining standards-based, COTS DOC middleware that can deliver end-to-end QoS support at multiple levels in distributed and embedded real-time systems. These specifications still lack sufficient detail, however, to portably configure and control processor, communication, and memory resources for applications with stringent QoS requirements. This paper provides four contributions to research on real-time DOC middleware. First, we illustrate how the CORBA 2.4 real-time and messaging specifications provide a starting point to address the needs of an important class of applications with stringent real-time requirements. Second, we illustrate how the CORBA 2.4 specifications are not sufficient to solve all the issues within this application domain. Third, we describe how we have implemented portions of these specifications, as well as several enhancements, using TAO, which is our open-source real-time CORBA ORB. Finally, we evaluate the performance of TAO empirically to illustrate how its features address the QoS requirements for certain classes of real-time applications. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The capability combines a new high‐performance multi‐spectral camera system with a distributed algorithm that computes a spectral‐screening principal component transform (PCT) that allows image streams from a dispersed collection of cameras to be disseminated, viewed, and interpreted by a distributed group of analysts in real‐time.
Abstract: This paper describes a novel real-time multi-spectral imaging capability for surveillance applications. The capability combines a new high-performance multi-spectral camera system with a distributed algorithm that computes a spectral-screening principal component transform (PCT). The camera system uses a novel filter wheel design together with a high-bandwidth CCD camera to allow image cubes to be delivered at 110 frames s with a spectral coverage between 400 and 1000 nm. The filters used in a particular application are selected to highlight a particular object based on its spectral signature. The distributed algorithm allows image streams from a dispersed collection of cameras to be disseminated, viewed, and interpreted by a distributed group of analysts in real-time. It operates on networks of commercial-off-the-shelf multiprocessors connected with high-performance (e.g. gigabit) networking, taking advantage of multi-threading where appropriate. The algorithm uses a concurrent formulation of the PCT to de-correlate and compress a multi-spectral image cube. Spectral screening is used to give features that occur infrequently (e.g. mechanized vehicles in a forest) equal importance to those that occur frequently (e.g. trees in the forest). A human-centered color-mapping scheme is used to maximize the impact of spectral contrast on the human visual system. To demonstrate the efficacy of the multi-spectral system, plant-life scenes with both real and artificial foliage are used. These scenes demonstrate the systems ability to distinguish elements of a scene that cannot be distinguished with the naked eye. The capability is evaluated in terms of visual performance, scalability, and real-time throughput. Our previous work on predictive analytical modeling is extended to answer practical design questions such as ‘For a specified cost, what system can be constructed and what performance will it attain?’ Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper investigates the impact, in terms of the performance of the server and its adjacent links, of introducing active nodes into the network of on‐line auction systems using the stochastic process algebra formalism PEPA.
Abstract: The standard design of on-line auction systems places most of the computational load on the server and its adjacent links, resulting in a bottleneck in the system. In this paper, we investigate the impact, in terms of the performance of the server and its adjacent links, of introducing active nodes into the network. The performance study of the system is done using the stochastic process algebra formalism PEPA. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A comparison of Windows NT, Linux, and QNX—a real‐time microkernel based on expressive power, performance, and ease‐of‐use metrics finds that none of these systems has a clear advantage over the others in all the metrics, but that each has its strong and weak points.
Abstract: SUMMARY Clusters use commodity hardware and software components to provide an environment for highperformance parallel processing. A major issue in the development of a cluster system is the choice of the operating system that will run on each node. We compare three alternatives: Windows NT, Linux, and QNX—a real-time microkernel. The comparison is based on expressive power, performance, and ease-ofuse metrics. The result is that none of these systems has a clear advantage over the others in all the metrics, but that each has its strong and weak points. Thus any choice of a base system will involve some technical compromises, but not major ones. Copyright  2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The decoupling of producers and consumers in time, space, and flow makes the publish/subscribe paradigm very attractive for large scale distribution, especially in environments like the Internet.
Abstract: Many distributed applications have a strong requirement for efficient dissemination of large amounts of information to widely spread consumers in large networks. These include applications in e-commerce and telecommunication. Publish/subscribe is considered one of the most important interaction styles to model communication at large scale. Producers publish information for a topic and consumers subscribe to the topics they wish to be informed of. The decoupling of producers and consumers in time, space, and flow makes the publish/subscribe paradigm very attractive for large scale distribution, especially in environments like the Internet. This paper describes the architecture and implementation of DACE (Distributed Asynchronous Computing Environment), a framework for publish/subscribe communication based on an object-oriented programming abstraction in the form of Distributed Asynchronous Collection (DAC). DACs capture the different variations of publish/subscribe, without blurring their respective advantages. The architecture we present is tolerant to network partitions and crash failures. The underlying model is based on the notion of Topic Membership: a weak membership for the parties involved in a topic. We present how Topic Membership enables the realization of a robust and efficient reliable multicast for large scale. The protocol ensures that, inside a topic, even a subscriber that is temporarily partitioned away eventually receives a published message.

Journal ArticleDOI
TL;DR: This study demonstrates the feasibility of incorporating the LLPC strategy into an existing commercial operating system and parallelizing compiler and provides further evidence of the performance improvement that is possible using this dynamic allocation strategy.
Abstract: Parallel applications typically do not perform well in a multiprogrammed environment that uses time-sharing to allocate processor resources to the applications' parallel threads. Co-scheduling related parallel threads, or statically partitioning the system, often can reduce the applications' execution times, but at the expense of reducing the overall system utilization. To address this problem, there has been increasing interest in dynamically allocating processors to applications based on their resource demands and the dynamically varying system load. The Loop-Level Process Control (LLPC) policy (Yue K, Lilja D. Efficient execution of parallel applications in multiprogrammed multiprocessor systems. 10th International Parallel Processing Symposium, 1996; 448–456) dynamically adjusts the number of threads an application is allowed to execute based on the application's available parallelism and the overall system load. This study demonstrates the feasibility of incorporating the LLPC strategy into an existing commercial operating system and parallelizing compiler and provides further evidence of the performance improvement that is possible using this dynamic allocation strategy. In this implementation, applications are automatically parallelized and enhanced with the appropriate LLPC hooks so that each application interacts with the modified version of the Solaris operating system. The parallelism of the applications are then dynamically adjusted automatically when they are executed in a multiprogrammed environment so that all applications obtain a fair share of the total processing resources. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: By using an example method, it is shown that the proposed approach increases the adaptability and reusability of design models through the application of fuzzy‐logic‐based techniques.
Abstract: While developing systems, software engineers generally have to deal with a large number of design alternatives. Current object-oriented methods aim to eliminate design alternatives whenever they are generated. Alternatives, however, should be eliminated only when sufficient information to take such a decision is available. Otherwise, alternatives have to be preserved to allow further refinements along the development process. Too early elimination of alternatives results in loss of information and excessive restriction of the design space. This paper aims to enhance the current object-oriented methods by modeling and controlling the design alternatives through the application of fuzzy-logic-based techniques. By using an example method, it is shown that the proposed approach increases the adaptability and reusability of design models. The method has been implemented and tested in our experimental CASE environment

Journal ArticleDOI
TL;DR: This paper describes the definition and implementation of an OpenMP‐like set of directives and library routines for shared memory parallel programming in Java, and presents a prototype implementation, consisting of a compiler and a runtime library, both written entirely in Java.
Abstract: This paper describes the definition and implementation of an OpenMP-like set of directives and library routines for shared memory parallel programming in Java. A specification of the directives and routines is proposed and discussed. A prototype implementation, consisting of a compiler and a runtime library, both written entirely in Java, is presented, which implements most of the proposed specification. Some preliminary performance results are reported.

Journal ArticleDOI
TL;DR: A model for the representation and retrieval of structured documents considering their temporal properties is presented, consisting of both a new data model and a query language that are specially adapted to the requirements of digital library applications.
Abstract: This paper presents a model for the representation and retrieval of structured documents considering their temporal properties. The purpose of this model is to serve as a platform for the development of digital library applications. Thus, it consists of both a new data model and a query language that are specially adapted to the requirements of these applications. The main elements of the data model are a flexible type system for structured documents, and two temporal dimensions that represent the temporal properties of documents and the evolution of the database schema. As for its query language, it allows the retrieval of documents by specifying conditions on their structure, contents and temporal properties. This query language has been designed for exploiting the temporal information stored into a large digital library, making possible to relate document contents in time, as well as to analyse the evolution of topics. The paper also includes some guidelines for the efficient implementation of databases of structured documents by adopting the proposed data and query models. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper reports on three parallel algorithms used for 3D reconstruction of asymmetric objects from their 2D projections and discusses their computational, communication, I/O, and space requirements and presents some performance data.
Abstract: The 3D electron‐density determination of viruses, from experimental data provided by electron microscopy, is a data‐intensive computation that requires the use of clusters of PCs or parallel computers. In this paper we report on three parallel algorithms used for 3D reconstruction of asymmetric objects from their 2D projections. We discuss their computational, communication, I/O, and space requirements and present some performance data. The algorithms are general and can be used for 3D reconstruction of asymmetric objects for applications other than structural biology. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a bytecode verifier that makes use of subtype loading constraints to check assignment compatibility of class types in Java 2 SDK 1.2, which can expose some inaccuracies and ambiguities in the JVM specification.
Abstract: SUMMARY In the course of our work in developing formal specifications for components of the Java Virtual Machine (JVM), we have uncovered subtle bugs in the bytecode verifier of Sun’s Java 2 SDK 1.2. These bugs, which lead to type safety violations, relate to the naming of reference types. Under certain circumstances, these names can be spoofed through delegating class loaders. These flaws expose some inaccuracies and ambiguities in the JVM specification. We propose several solutions to all of these bugs. In particular, we propose a general solution that makes use of subtype loading constraints. Such constraints complement the equality loading constraints introduced in the Java 2 Platform, and are posted by the bytecode verifier when checking assignment compatibility of class types. By posting constraints instead of resolving and loading classes, the bytecode verifier in our solution has a cleaner interface with the rest of the JVM, and allows lazier loading. We sketch some excerpts of our mathematical formalization of this approach and of its type safety results. Copyright  2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This work analyzed the impact of different redistribution strategies on the performance of parallel FFT, on various machine architectures and found that some redistribution strategies were consistently superior, while some others were unexpectedly inferior.
Abstract: The best approach to parallelize multidimensional FFT algorithms has long been under debate. Distributed transposes are widely used, but they also vary in communication policies and hence performance. In this work we analyze the impact of different redistribution strategies on the performance of parallel FFT, on various machine architectures. We found that some redistribution strategies were consistently superior, while some others were unexpectedly inferior. An in-depth investigation into the reasons for this behavior is included in this work. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: HBench:Java is presented, an application‐specific benchmarking framework that uses vectors to characterize the application and the underlying Java Virtual Machine (JVM) and carefully combines the two vectors to form a single metric that reflects a specific application's performance on a particular JVM such that the performance of multiple JVMs can be realistically compared.
Abstract: Java applications represent a broad class of programs, ranging from programs running on embedded products to high-performance server applications. Standard Java benchmarks ignore this fact and assume a fixed workload. When an actual application's behavior differs from that included in a standard benchmark, the benchmark results are useless, if not misleading. In this paper, we present HBench:Java, an application-specific benchmarking framework, based on the concept that a system's performance must be measured in the context of the application of interest. HBench:Java employs a methodology that uses vectors to characterize the application and the underlying Java Virtual Machine (JVM) and carefully combines the two vectors to form a single metric that reflects a specific application's performance on a particular JVM such that the performance of multiple JVMs can be realistically compared. Our performance results demonstrate HBench:Java's superiority over traditional benchmarking approaches in predicting relative performance of real applications and its ability to pinpoint performance problems, even with a simplified vector. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The purpose of this paper is to investigate the scalability and performance of seven, simple OpenMP test programs and to compare their performance with equivalent MPI programs on an SGI Origin 2000.
Abstract: The purpose of this paper is to investigate the scalability and performance of seven, simple OpenMP test programs and to compare their performance with equivalent MPI programs on an SGI Origin 2000. Data distribution directives were used to make sure that the OpenMP implementation had the same data distribution as the MPI implementation. For the matrix‐times‐vector (test 5) and the matrix‐times‐matrix (test 7) tests, the syntax allowed in OpenMP 1.1 does not allow OpenMP compilers to be able to generate efficient code since the reduction clause is not currently allowed for arrays. (This problem is corrected in OpenMP 2.0.) For the remaining five tests, the OpenMP version performed and scaled significantly better than the corresponding MPI implementation, except for the right shift test (test 2) for a small message. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: Methods of deriving a parallel version of Stone's Strongly Implicit Procedure for solving sparse linear equations arising from finite difference approximation to partial differential equations (PDEs) are described and a red‐black ordering of grid points is shown to be far more efficient.
Abstract: In this paper, we describe various methods of deriving a parallel version of Stone's Strongly Implicit Procedure (SIP) for solving sparse linear equations arising from finite difference approximation to partial differential equations (PDE's). Sequential versions of this algorithm have been very successful in solving semi-conductor, heat conduction and flow simulation problems and an efficient parallel version would enable much larger simulations to be run. An initial investigation of various parallelising strategies was undertaken using a version of High Performance Fortran (HPF) and the best methods were reprogrammed using the MPI message passing libraries for increased efficiency. Early attempts concentrated on developing a parallel version of the characteristic wavefront computation pattern of the existing sequential SIP code. However, a red-black ordering of grid points, similar to that used in parallel versions of the Gauss-Seidel algorithm, is shown to be far more efficient. The results of both the wavefront and red-black MPI based algorithms are reported for various size problems and number of processors on a sixteen node IBM SP2.

Journal ArticleDOI
TL;DR: The application of modern analysis techniques to the important Message Passing Interface standard is done in order to obtain information useful in designing both application programmer interfaces for object‐oriented languages, and message passing systems.
Abstract: The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application programmer interfaces for object‐oriented languages, and message passing systems. Recognition of ‘Design Patterns’ within MPI is an important discernment of this work. A further contribution is a comparative discussion of the design and evolution of three actual object‐oriented designs for the Message Passing Interface ( MPI‐1SF ) application programmer interface (API), two of which have influenced the standardization of C++ explicit parallel programming with MPI‐2, and which strongly indicate the value of a priori object‐oriented design and analysis of such APIs. Knowledge of design patterns is assumed herein.

Journal ArticleDOI
TL;DR: Access from both remote and local mobile components needs to be uniformly controlled when mobility is used for dynamic relocation of distributed components, and this requires integration of access control with dynamic probing of resource availability.
Abstract: Component mobility is an important enabling technology for the design of wide area pervasive applications, but it introduces new challenges in the critical aspect of access control. In particular, when mobility is used for dynamic relocation of distributed components, access from both remote and local mobile components needs to be uniformly controlled. The dynamic determination of execution location, possibly crossing multiple administrative authorities, requires dynamic establishment and enforcement of access control. The deployment over widely heterogeneous hosts and devices requires integration of access control with dynamic probing of resource availability so as to influence the relocation process. This paper presents a model for dynamic specification and enforcement of access control in the context of dynamically relocatable components, and an implementation in the Java-based FarGo framework. The specification follows a negotiation-based protocol that enables dynamic matching of available and required resources by providers and consumers, respectively. Enforcement is provided through a capability-based secure component reference architecture, which uniformly applies to both local and remote references, and through instance-level, as opposed to type-level (supported in Java), access control. Finally, access control is integrated into the programming model in a non-intrusive fashion, by separating the encoding of access control from the encoding of the logic of the application. Copyright © 2001 John Wiley & Sons, Ltd.