scispace - formally typeset
Search or ask a question

Showing papers in "Scalable Computing: Practice and Experience in 2000"


Journal ArticleDOI
TL;DR: The main purpose is to update the designers and users of parallel numerical algorithms with the latest research in the field and present the novel ideas, results and work in progress and advancing state-of-the-art techniques in the area of parallel and distributed computing for numerical and computational optimization problems in scientific and engineering application.
Abstract: Edited by Tianruo Yang Kluwer Academic Publisher, Dordrech, Netherlands, 1999, 248 pp. ISBN 0-7923-8588-8, $135.00 This book contains a selection of contributed and invited papers presented and the workshop Frontiers of Parallel Numerical Computations and Applications, in the IEEE 7th Symposium on the Frontiers on Massively Parallel Computers (Frontiers '99) at Annapolis, Maryland, February 20-25, 1999. Its main purpose is to update the designers and users of parallel numerical algorithms with the latest research in the field. A broad spectrum of topics on parallel numerical computations, with applications to some of the more challenging engineering problems, is covered. Parallel algorithm designers and engineers who use extensively parallel numerical computations, as well as graduate students in Computer Science, Scientific Computing, various engineering fields and applied mathematics should benefit from reading it. The first part is addressed to a larger audience and presents papers on parallel numerical algorithms. Two new libraries are presented: PSPASSES and PoLAPACK. PSPASSES is a collection of parallel direct solvers, for sparse symmetric positive definite linear systems, which are characterized by high performance and good scalability. PoLAPACK library contains LU and QR codes based on a new blocking strategy that guarantees good performance regardless of the physical block size. Next, an efficient approach to solving stiff ordinary differential equations by diagonal implicitly iterated Runge-Kutta (DIIRK) method is described. DIIRK renders a fast parallel implementation due to a reduced number of function evaluation and an automatic stepsize control mechanism. Finally, minimization of sufficiently smooth non-linear functionals is sought via parallel space decomposition. Here, a theoretical background of the problem and two equivalent algorithms are presented. New research directions for classical solvers are treated in the next three papers: first, reduction of the global synchronization in the biconjugate gradient method, second, a new more efficient Jacobi ordering for the multiple-port hypercubes, and finally, an analysis of the theoretical performance of an improved version of the Quasi-minimal residual method. Parallel numerical applications constitute the second part of the book, with results from fluid mechanics, material sciences, applications to signal and image processing, dynamic systems, semiconductor technology and electronic circuits and systems design. With one exception, the authors expose in detail parallel implementations of the algorithms and numerical results. First, a 3D-elasticity problem is solved using an additive overlapping domain decomposition algorithm. Second, an overlapping mesh technique is used in a parallel solver for the compressible flow problem. Then, a parallel version of a complex numerical algorithm to solve a lubrication problem studied in tribology is introduced. Next, a timid approach to parallel computing of the cavity flow by the finite element method is presented. The problem solved is rather small for today's needs and only up to 6 processors are used. This is also the only paper that does not present results from numerical experiments. The remaining applications discussed in the subsequent chapters are: large scale multidisciplinary design optimization problem with application to the design of a supersonic commercial aircraft, a report on progress in parallel solution of an electromagnetic scattering problem using boundary integral methods and an optimal solution to the convection-diffusion equation modeling the concentration of a pollutant in the air. The book is of definite interest to readers who keep up-to-date with the parallel numerical computation research. The main purpose, to present the novel ideas, results and work in progress and advancing state-of-the-art techniques in the area of parallel and distributed computing for numerical and computational optimization problems in scientific and engineering application is clearly achieved. However, due to its content it cannot serve as a textbook for a computer science or engineering class. Overall, is a reference type book to be kept by specialists and in a library rather than a book to be purchased for self-introduction to the field. Most of the papers presented are results of ongoing research and so they rely heavily on previous results. On the other hand, with only one exception, the results presented in the papers are a great source of information for the researchers currently involved in the field. Michelle Pal, Los Alamos National Laboratory

4,696 citations


Journal ArticleDOI
TL;DR: Yair Censor and Stavros A. Zenios, Oxford University Press, New York, 1997, 539 pp.
Abstract: Yair Censor and Stavros A. Zenios, Oxford University Press, New York, 1997, 539 pp., ISBN 0-19-510062-X, $85.00

486 citations


Journal ArticleDOI
TL;DR: This book is an introductory text to the ideas, concepts, and topics of concurrency and provides a systematic treatment of concepts, as means to rigorously specify and model concurrent systems, with Java examples to animate and illustrate the concepts discussed.
Abstract: Jeff Magee and Jeff Kramer John Wiley and Sons, New York, NY, 1999, 374 pp. ISBN 0471987107, $64.99 The book is an introductory text to the ideas, concepts, and topics of concurrency. The text focuses on understanding concepts and techniques, and problems of concurrency, not on the implementation or language. The author's overall goal is that by a combination of learning and doing using Java should make the process of acquiring the skills relating to concurrency interesting, enjoyable, and challenging. The Java language is thus used only for illustration and programming experience of the reader. The book is intended for a computer science student or software developer. A background in programming is expected, as well as some familiarity with the essential concepts of object-oriented programming. Knowledge of the Java programming language and operating systems concepts is beneficial but not required. After reading the text, a reader can expect to have a broad understanding of the concepts of concurrency, the problems that arise, methods to emphasis desirable properties in a concurrent system, and ways to avoid the undesirable ones. The text provides a systematic treatment of concepts, as means to rigorously specify and model concurrent systems, with Java examples to animate and illustrate the concepts discussed. The book is organized into twelve chapters, with three appendices. The first eight chapters provide a concise and comprehensive foundation of concurrency, with the remaining four chapters focusing on more advanced concepts. The latter four chapters are somewhat supplemental and can be read at the discretion of the reader. The content of the text is self-contained, with no reliance upon external references. The authors provide a concluding set of notes and further topics which reference external works, but these are not used in the chapter contents. The introduction provides the foundation for the book, and an idea of what will follow in the rest of the text. The basics of modeling and Java are covered. Authors follow with the concepts of processes, threads and with the idea of concurrency. The emphasis is on using Java, and on the notation used to define a concurrent process or thread. The discussion continues covering interleaving and asynchronous concurrency. Various details about composition of parallel processes and finite state processing modeling are presented. The programming of concurrent execution using Java is then examined via the concept of multiple threads. Here, sharing objects among Java threads is also discussed. The concept of conditional synchronization is covered next, and is used to construct a monitor. The methods of the base object in Java is used, and a waiting queue associated with a Java object is discussed and implemented. This discussion is followed by introduction of deadlock. Authors show the conditions necessary for a deadlock to occur and present an analysis of a concurrent system establishing that such a system is deadlock-free. The classic Dining Philosopher's problem is used to illustrate the distinction between finding deadlocks in models and in programs. The presentation continues with the safety property (that asserts that nothing bad happens), and the liveliness property (that asserts that nothing good happens). Here, authors use the material on model-based design to consolidate many of the concepts previously discussed. No one modeling method or design methodology is emphasized. Then creation and elimination of threads in a running program is covered next. The dynamic starting and stopping of Java threads is presented. The discussion moves then to messaging passing. Asynchronous and synchronous messaging is presented with Java programming examples. The concept of a rendezvous, allowing for interaction in a messaging system, is also presented. Based on the concepts developed so far filter-pipeline, supervisor-worker, and announcer-listener concurrent architectures are examined. Java programs demonstrating how the elements of each model interact at runtime are developed. The presentation completes with modeling and implementing of timed systems. Time is signaled by the passage of successive time ticks. The authors then elaborate on a system that is event-based rather than thread-based. Java programs are used to illustrate in implementation the concepts discussed. The three appendices provide a reference, specification, and listing of semantics for the finite-state modeling algebra used by the authors throughout the book. The text is ideal for teaching in a classroom environment, or for a motivated reader to use for self-paced learning. Its organization follows a logical train of discussion and thought, with each chapter building successively on the other, with the more difficult concepts and topics withheld until later. This gradual approach makes the material flow smoothly and cohesively from the beginning to the end. The book is intended as a textbook, so does organize the material and discussion as one, but does not become too dry or bland to stifle the reader's interest in the material. The examples are inter-mixed with the discussion to highlight and emphasize essential points the authors make in each chapter. The text illustrates concepts well under discussion, and does not become a long-winded recital of theory. The topics examined are fundamental to concurrency, and are well illustrated and discussed, but no one topic is examined extensively. The code examples are well written concerning the programming style, indentation and comments. A CD is provided, containing the codes from the book. In addition supplementary material consisting of presentation slides and an errata is provided on the author's WWW site, as well as on the CD. The book is up to date both on the topics of concurrency, and with the Java applets, code, and programs used to illustrate them. Later, as the Java language evolves, the examples may need to be updated, but for the time being the text well represents that state of the Java language. William F. Gilreath, Micron Technology, Inc., Boise Idaho

211 citations


Journal ArticleDOI
TL;DR: This book provides an in-depth view of one possible Beowulf system covering details of hardware selection, operating system configuration, communication software and a parallel sorting application and basically fulfills the stated purpose, but the subject matter is too broad for a single book.
Abstract: Thomas L. Sterling, John Salmon, Donald J. Becker, Savarese, Daniel F. Savarese MIT Press, Cambridge, MA, 1999, 250 pp. ISBN: 026269218X, $31.95 The authors of this book have attempted to describe in 232 pages a subject which is difficult to fit into a book and which is rapidly evolving. Their stated purpose is "enabling, facilitating, and accelerating the adoption of the Beowulf model of distributed computing." They provide an in-depth view of one possible Beowulf system covering details of hardware selection, operating system configuration, communication software and a parallel sorting application. Overall the book is informative enough to convince people of the value of using Beowulf clustering and basically fulfills the stated purpose, but the subject matter is too broad for a single book. The book begins with an overview of Beowulf systems including some background material on parallel computers and outlining the rest of the book. This discussion on parallel computers in general is quite brief. The authors point out the importance of the recent increases in performance of mass-market computers, which is the reason why Beowulf clusters exist. They continue this section by giving an overview of the hardware and software components of the Beowulf cluster. Following the introduction is a discussion of the hardware elements used to build a typical Beowulf cluster. This discussion is nicely written and introduces topics such as the PCI bus, types of memory, and motherboards. The details are useful now, although the details of hardware design evolve too rapidly for this discussion to be as useful several years from now. Next, the authors introduce the Linux operating system. This discussion is not detailed enough to guide a user through installation and configuration, but it does provide an overview and it includes a useful list of references both printed and electronic. Following the Linux discussion, the authors sketch networking related issues. They discuss most available network hardware solutions and discuss Ethernet in sufficient detail to understand its performance. Their discussion of TCP/IP is detailed enough to understand IP addressing but does not include a discussion of routing. Overall, this discussion is a mixture of overviews of topics and detailed discussions. The authors provide a useful discussion of how to manage a Beowulf cluster. They suggest practical methods of cloning nodes using Linux tools and they describe methods for day-to-day administration. This discussion is possibly the most useful contribution of the book. Most of the book consists of topics covered on more detail in on-line resources, but managing a Beowulf cluster is a little off the beaten path. Next, the authors discuss parallelism. This includes a categorization of parallel algorithms along with an introduction to a variety of parallel performance metrics. They also give a useful introduction to MPI including a nice exposition about a sorting example. Their example provides a good example of MPI programming along with a useful analysis of the performance characteristics of the application. Overall, the authors present a useful discussion of Beowulf clusters. Most topics are discussed lightly while a few are discussed in-depth. The reader should not expect this book to answer every question encountered in configuring and using a Beowulf cluster. Instead this book offers an overview of the process and details can be obtained by consulting manual pages, Linux Howto documents and on-line resources about Linux and MPI. Benjamin R. Seyfarth University of Southern Mississippi

150 citations


Journal ArticleDOI
TL;DR: Assessment of the performance characteristics of distributed software architectures using the Software Performance Engineering (SPE) approach and the information required to perform such assessments, particularly the information about synchronization points and types of synchronization mechanisms, and the modeling approach are described.
Abstract: Distributed systems were once the exception, constructed only rarely and with great difficulty by developers who spent significant amounts of time mastering the technology. Now, as modern software technologies have made distributed systems easier to construct, they have become the norm. Unfortunately, many distributed systems fail to meet their performance objectives when they are initially constructed. Others perform adequately with a small number of users but do not scale to support increased usage. These performance failures result in damaged customer relations, lost productivity for users, lost revenue, cost overruns due to tuning or redesign, and missed market windows. Our experience is that most performance failures are due to a lack of consideration of performance issues early in the development process, in the architectural phase. This paper discusses assessment of the performance characteristics of distributed software architectures using the Software Performance Engineering (SPE) approach. We describe the information required to perform such assessments, particularly the information about synchronization points and types of synchronization mechanisms, and the modeling approach. The case study demonstrates how to construct performance models for distributed systems and illustrates how simple models of software architectures are sufficient for early identification of performance problems.

42 citations


Journal ArticleDOI
TL;DR: With a level of expertise simply unmatched in the field, The Distributed System Design provides readers with a solid foundation to understand and further explore in this increasingly important area of technology.
Abstract: Jie Wu CRC Press Publishing Company, Boca Raton, FL, 1998, 496 pp. ISBN 0-8493-3178-1, $74.95 The Distributed System Design is a superbly organized, comprehensive exposition of basic concepts, issues, and some possible solutions in the distributed systems design. Highly regarded author has meticulously selected the material from original sources of contemporary literature, new contributed papers by preeminent researchers in the field plus his own research results to provide state-of-the-art discussions on various important issues in distributed systems design. Graduate/senior undergraduate students in distributed systems design/advanced operating systems as well as computer professionals analyzing and designing distributed/open/parallel systems will find key insights into current trends and solutions of distributed systems likely to shape future directions of this important field. From the beginning, the author motivates the study of this subject well by pointing out that the future requirements for computing speed, system reliability, and cost-effectiveness entail the development of alternative computers to replace the traditional von Neumann organization. As computing networks come into being, one of the latest dreams is now possible—distributed computing. The twelve chapters of this book can be roughly divided into three parts. Part I (Chapters 1 to 3) introduces the necessary background material and foundations in the aspect of distributed system model, distributed programming languages and formal approaches to distributed systems design. The background material is comprehensive and the model that the author uses throughout the book is emphasized. This part gives the reader a perspective on past accomplishments and a global picture of the area. Part II (Chapters 4 to 11) addresses various important issues in distributed systems design such as mutual exclusion, deadlock, interprocessor communication mechanisms, reliability issue, static and dynamic load distribution and distributed data management. Each chapter concentrates on software elements of design that emphasize performance, flexibility, fault tolerance, and scalability. Each issue is put forward clearly by descriptions and diagrams. Possible solutions are presented with algorithms and explicit examples. Ample references are provided for further exploration of the issues. This part is beneficial to students/researchers on their research projects in distributed systems. Part III (Chapter 12) concentrates on the applications of distributed design in operating systems, file systems, shared memory systems, database systems and heterogeneous processing, which further demonstrate the significance of distributed systems. In addition, future research directions are listed that provide readers with the trend in this field. The exercises after each chapter are original, challenging and pertaining to the content just covered. They have a nice mix of theory, analysis, and design, which not only help readers refresh their knowledge but also widen their view scope of this subject. In summary, this book is a well-organized and thoroughly developed text with plenty of current issues and research results in distributed systems explained in supportive examples and illustrations. The major sections of the book are well ordered, as is each individual chapter. In a word, with a level of expertise simply unmatched in the field, Distributed System Design provides readers with a solid foundation to understand and further explore in this increasingly important area of technology. Xiao Chen Southwest Texas State University

38 citations


Journal ArticleDOI
TL;DR: The paper presents how the probe effect can be eliminated and the overhead can be minimised during replay based debugging, and proposes a hierarchical fault-tolerant multicast discovery scheme.
Abstract: Distributed and Parallel Systems A selection of best papers of the 4th Austrian-Hungarian Workshop on Distributed and Parallel Systems are presented in this issue. The series of workshops started as a local meeting in 1992 and it grew to an interna- tionally acclaimed event of computing techniques covering not just parallel and distributed programming in the classical sense but also emerging topics like ubiquitous and pervasive computing, cluster and grid technology, multimedia and challenging applications. Thoai et al focus on a fundamental problem of parallel program development: debugging. Since the execu- tion of parallel programs is nondeterministic, forcing a certain execution trace among the many possible ones requires sophisticated replay techniques. The paper presents how the probe effect can be eliminated and the overhead can be minimised during replay based debugging. The paper by Lovas et al. presents parallel implementation of an ultra-short range weather prediction method, supported by a graphical development tool, P-GRADE. The paper introduces all stages of the program development from editing to performance analysis. A novel approach for resource discovery is presented by Juhi?½asz et al. A grid system should be able to provide brokering services for potentially thousands and millions of resources. Most approaches nowadays are either centralised or flat and in such a way are not really scaleable. The paper proposes a hierarchical fault-tolerant multicast discovery scheme. The paper by Heinzlreiter et al. presents a grid middleware that enables realising novel interactive visu- alisation systems in grid environments. Tools like the Grid Visualisation Kernel (GVK) are leading towards pioneering grid based Virtual Reality applications. Fault diagnosis is the central issue of the paper by Polgi?½ar et al. They propose a modification of P-graph model in order to improve fault diagnosis in complex multiprocessor systems. Bi?½osa et al. introduce advanced fault tolerating mechanisms for Distributed Maple, a parallel computer alge- bra system. The tolerance of failed links and nodes is enhanced by adding reconnection and restart mechanisms as well as change the virtual root node in order to avoid overall failure. Emerging mobile applications raise the issue of context awareness. Ferscha et al. introduce the techniques related to context sensing, representation and delivery and proposes a new approach for context based mobile computing. Goldschmidt et al. analyse the requirements of adaptive multimedia servers where the dynamic migration of multimedia applications is supported and an agent-based infrastructure is proposed. The idea is supported by a specification and implementation of a CORBA-based interface. The progress of multimedia over the internet is obvious that raises the need of intelligent video caches. The paper by Schojer et al. introduces a proxy cache that allows fast and efficient adaptation of video based on the MPEG-4 standard. These papers demonstrate the wide spectrum of the workshop topics: from distributed computing via grids towards novel complex systems. Zsolt Nemeth Dieter Kranzlmuller Peter Kacsuk Jens Volkert

20 citations


Journal ArticleDOI
TL;DR: The architecture and framework for a benchmark suite that has been developed as part of the DeSiDeRaTa project, useful for evaluation of the Quality of Service (QoS) management and/or Resource Management services in distributed real-time systems are presented.
Abstract: In this paper we present the architecture and framework for a benchmark suite that has been developed as part of the DeSiDeRaTa project. The benchmark suite is representative of the emerging generation of distributed, mission-critical, real-time control systems that operate in dynamic environments. Systems that operate in such environments may have unknown worst- case scenarios, may have large variances in the sizes of the data and event sets that they process (and thus, have large variances in execution latencies and resource requirements), and may be very difficult to characterize statically, even by time-invariant statistical distributions. The benchmark suite (called DynBench) is useful for evaluation of the Quality of Service (QoS) management and/or Resource Management (RM) services in distributed real-time systems. As such, DynBench includes a set of performance metrics for the evaluation of the QoS and RM technologies in dynamic, distributed real-time systems. The paper demonstrates the successful application of DynBench in evaluation of the DeSiDeRaTa QoS management middleware.

17 citations


Journal ArticleDOI
TL;DR: The paper provides a generalization of the previous algorithm for the parallel multiplication of a vector by a Kronecker product of matrices, and shows that the multiplication requires at least Θ ( log (p) communication steps, assuming that there is no computation redundancy.
Abstract: The paper provides a generalization of our previous algorithm for the parallel multiplication of a vector by a Kronecker product of matrices. For any p , a factor of the problem size, our algorithm runs on p processors with a minimum number of communication steps and memory space. Specifically, on p processors with global communication, we show that the multiplication requires at least Θ ( log (p)) communication steps, assuming that there is no computation redundancy. This complexity is revised according to the underlying topology, and some performance results on the CRAY T3E are given.

17 citations


Journal ArticleDOI
TL;DR: These two books, which have been published in almost the same time, are addressed to a relatively large audience and may be of interest to people working on parallel optimization algorithms and operations research and computer science students.
Abstract: Edited by Athanasios Migdalas, Panos M. Pardalos and Sverre Story, Kluwer Academic Publishers, Dordrecht, 1997, 585 pp., ISBN 0-7923-4583-5, $319.50 These two books, which have been published in almost the same time, are addressed to a relatively large audience. They may be of interest to people working on parallel optimization algorithms. The first book, in particular, may be of interest to readers involved in the real-life applications of optimization modeling. In addition, operations research and computer science students may benefit from them. Both books may be used as textbooks for graduate courses in those specializations. However, some background in mathematical analysis is necessary as the introductory requirement. In my opinion, those two books together constitute an ostensible state-of-the-art summary of parallel optimization. Book by Censor and Zenios is much more consistent and homogeneous and contains, among others, description of a relatively new approach based on the generalized projections. The second book is very carefully edited and contains the set of separate although well selected papers. The first book is devoted to a very important question how to use parallelism when solving large-scale optimization models. The authors consider two main approaches. The first, a relatively new one, relies on the sequential orthogonalization and its generalization based on the generalized Bregman's projections. Algorithms belonging to that group facilitate parallel computations due to the structure of their operations. The second approach exploits the structure of the model and some sparsity patterns often existing in the large-scale optimization models. Such structures arise, for instance, in a natural way in large spatial systems (in transportation or telecommunication problems). The authors have investigated decomposition algorithms based on the linearization or diagonal-quadratic approximations. In that way, they have obtained an algorithm with relatively simple coordinating (master) problem. Their third attempt to parallelize optimization algorithms concerns the primal-dual path-following algorithm (an example of the interior point algorithms). All interior point methods require at each step solution of a linear system of equations involving matrix AA T where A represents the linear constraints matrix in the linear or quadratic programming problem in question. The authors have shown how to exploit sparsity solving that system of linear equations in parallel. The book is a nice combination of a sound mathematical theory and applications. Part one of the book is devoted to the theory of generalized distances and projections and their use in proximal minimization applied to linear programming problems. It contains also some elements of penalty, barrier and augmented Lagrangian methods theory. Second part describes iterative projections algorithms, model decomposition algorithms and interior point algorithms developed on the theoretical basis from part one. Third part presents optimization models for such problems as matrix estimation problem, image recognition from projections, treatment planning in the radiation therapy, multicommodity network flow problems, planning under uncertainty. At the end, the reader finds discussion of the implementation issues and cited results of computations. The second book is in some sense complementary to the first one. Its range stretches from the theoretical models for parallel algorithms design and their complexity, vie of parallel computers through eyes of an experienced programmer, through sparse linear systems arising in various optimization problems. It contains even a paper devoted to the variational inequalities problem. However, it does not even touch optimization modeling and real life applications. This last feature is the strongest merit of the book written by Censor and Zenios, whose first theoretical part of the book is motivated by the applications considered in part two. Book edited by Migdalas, Pardalos and Story covers broader scope of optimization algorithms. For instance, discrete and stochastic optimization problems and variational inequalities, which are not represented in the first book (restricted practically to the linearly constrained continuous optimization problems). The first two chapters discuss the theoretical model for parallel algorithm design and for their complexity. Third chapter presents a survey on current high performance parallel computer architectures and discusses their performance bottlenecks. Fourth chapter is devoted to scalable parallel algorithms for sparse linear systems and the fifth one investigates automatic parallelization of the computation of the ordinary differential equations systems arising in the 2D-bearing mechanical problem. The next eight chapters are devoted to the optimization problems and methods. Chapter six contains a survey of parallel algorithms for network problems and a thorough discussion of the implementation of a parallel solver for the traffic assignment problem. In chapter seven one finds review on the sequential branch and bound method and a discussion of the problems connected with its parallelization. Chapter eight presents parallelizaton of heuristic methods for combinatorial optimization. Chapter nine contains analysis of decomposition algorithms for differentiable optimization. Chapter ten describes parallel algorithms for finite dimensional variational inequalities and chapter eleven parallel algorithms for stochastic programming. Chapter twelve deals with the heuristic algorithms for global optimization problems while chapter thirteen presents logarithmic barrier function algorithms for neural network training. The first book, in my opinion, would be a valuable supplement to a private library of any person, student, researcher or practitioner interested in operations research and various aspects of parallel optimization. The second book is almost four times as expensive as the first one. In my opinion, its content does not justify such big difference in price. Especially, since approximately one-third of its material may be found in general, much cheaper books on parallel computing.. Andrzej Stachurski Technical University of Warsaw

13 citations


Journal ArticleDOI
TL;DR: Various aspects of reaction-diffusion computing are illustrated by actual examples of parallel solutions of various problems from computational geometry, optimization on graphs and communication networks, control of mobile robots and implementation of logical operations.
Abstract: Non-linear chemical and excitable media exhibit a wide range of space-time dynamic: from well known circular waves to self-localized excitations. If we take a resting medium and change concentration of reagents or other parameters then diffusive or phase waves are generated and spread all over the medium. The waves interact one with another and form either dissipative structure or a precipitate. All micro-volumes of the medium update their states (local concentrations of reagents) in parallel. Thus, the medium can be thought of as a massive parallel processor, where data and results of a computation are represented by concentration profiles of the reagents. A theory of reaction-diffusion processors is still under development. In the paper we give an account of our personal experience in design of reaction-diffusion and excitable processors, mathematical models and working prototypes. Various aspects of reaction-diffusion computing are illustrated by actual examples of parallel solutions of various problems from computational geometry, optimization on graphs and communication networks, control of mobile robots and implementation of logical operations. Prospective material base for fabrication of reaction-diffusion and excitable processors is also tackled.

Journal ArticleDOI
TL;DR: The DISCWorld Remote Access Mechanism (DRAM) as discussed by the authors provides the user and system with a scalable abstraction over remote data and the operations that are possible on the data.
Abstract: Efficient, scalable remote access to data is a key aspect in wide area metacomputing environments. One of the limitations of current client-server computing models is their inability to create, retain and trade tokens which represent data or services on remote computers alongwith the metadata to adequately describe the data or services. Most current client-server software systems require the user to submit all the data inputs that are needed for a remote operation, and after the operation is complete, all the resultant output data is returned to the originating client. Pipelining remote processes requires data be retained at the remote site for achieving performance on high latency wide area networks. We introduce the DISCWorld Remote Access Mechanism (DRAM), an integral component of our DISCWorld metacomputing environment, which provides the user and system with a scalable abstraction over remote data and the operations that are possible on the data. We present a formal notation for DRAM's and discuss the implementation and performance of DRAM's when compared with traditional client-server systems.

Journal ArticleDOI
TL;DR: An approach to improve the computational performance of genetic programming by exploiting parallelism at the level of evaluation of the individuals, based on DCOM client-server model is presented.
Abstract: We present an approach for parallel distributed implementation of genetic programming, which is devoted to improve the computational performance of genetic programming by exploiting parallelism at the level of evaluation of the individuals. The approach is based on DCOM client-server model. Using the DCOM-paradigm offers the advantages of parallel distributed implementation of genetic programming, such as binary standardization, platform-, machine- and protocol-neutrality, and seamless integration with different Internet protocols. The developed implementation of genetic programming runs in LAN and/or Internet environments. The double-queued multi-threaded architecture of the DCOM-server, aimed to extend the functionality of the DCOM with features, such as asynchronous communications still implementing blocking-mode calls, and reduced communication overhead doing empty calls , is developed. The implementation of batching, directed towards the alleviation of communication overhead doing empty calls , is proposed. Analytically estimated and experimentally obtained performance evaluation results are discussed. The results show that clear super linear speedup can be achieved upon code growth in genetic programming.

Journal ArticleDOI
TL;DR: It is shown that this computer model allows for efficient implementation of parallel prefix computations and a large variety of applications from different areas is presented to demonstrate how parallel prefix Computations can be used as key operations for deriving efficient implementations on the KPROC.
Abstract: The KPROC (KiloPROCessor) architecture is the first implementation of a parallel computer with 1024 floating-point processors on a single chip. It strictly follows the concept of an instruction systolic array. The modular organisation allows for either building large arrays of many KPROC chips or speeding up small machines with a single KPROC as a coprocessor. This paper presents concept of this parallel computer model as well as the architectural details of the processor design. It is shown that this computer model allows for efficient implementation of parallel prefix computations. A large variety of applications from different areas is presented to demonstrate how parallel prefix computations can be used as key operations for deriving efficient implementations on the KPROC.

Journal ArticleDOI
TL;DR: This paper shows how it is feasible to efficiently perform large radiosity computations on a conventional (distributed) shared memory multiprocessor machine, and develops appropriate partitioning and scheduling techniques, that deliver an optimal load balancing, while still exhibiting excellent data locality.
Abstract: We show, in this paper, how it is feasible to efficiently perform large radiosity computations on a conventional (distributed) shared memory multiprocessor machine. Hierarchical radiosity algorithms, although computationally expensive, are an efficient view-independent way to compute the global illumination which gives the visual ambiance to a scene. Their effective parallelization is made challenging, however, by their non-uniform, dynamically changing characteristics, and their need for long-range communication. To address this need, we have developed appropriate partitioning and scheduling techniques, that deliver an optimal load balancing, while still exhibiting excellent data locality. We provide the detailed implementation of these techniques and present results of experiments showing very good acceleration and scalability performances. The accurate radiosity solutions required to render high quality images of an extremely large model are computed in a reasonable time. The rendering capabilities of modern graphics hardware are then used to visualize this virtual pre-lit environment in real-time: a two minutes QuickTime movie example can be downloaded from our site: ftp://ftp.loria.fr/pub/loria/isa/Cavin/sodaHallWalk.qt.gz (last accessable in 2000)

Journal ArticleDOI
TL;DR: This paper focuses on the parallel aspects of the FPNA computation paradigm, from its definition to its applied implementation on FPGAs, to attest that a connectionist paradigm may represent an actually practical model of parallel computing.
Abstract: The distributed structure of artificial neural networks makes them stand as models of parallel computation. Their very fine grain parallelism uses many information exchanges, so that hardware implementations are more likely to fit neural computations. But the number of operators and the complex connection graph of most usual neural models can not be directly handled by digital hardware devices. Therefore a theoretical and practical framework has been defined to reconcile simple hardware topologies with complex neural architectures. This framework has been designed mainly to meet the demands of configurable digital hardware. Field programmable neural arrays (FPNA) are based on an original paradigm of neural computation, so that they compute complex neural functions despite their simplified architectures. This paper focuses on the parallel aspects of the FPNA computation paradigm, from its definition to its applied implementation on FPGAs. FPNAs attest that a connectionist paradigm may represent an actually practical model of parallel computing.

Journal ArticleDOI
TL;DR: The design principles and optimizations necessary to develop efficient and scalable Web servers are outlined and how the JAWS OO design is customized to leverage advanced features of Windows NT on multi-processor platforms linked by high-speed ATM networks is described.
Abstract: This paper provides two contributions to the study of high-performance object-oriented (OO) Web servers. First, it outlines the design principles and optimizations necessary to develop efficient and scalable Web servers and illustrates how we have applied these principles and optimizations to create JAWS. JAWS is a high-performance Web server that is designed to alleviate overheads incurred by existing Web servers on high-speed networks. In addition to its highly extensible OO design, it is also highly efficient, consistently outperforming existing Web servers, such as Apache, Java Server, PHTTPD, Zeus, and Netscape Enterprise, over 155 Mbps ATM networks on UNIX platforms. Second, this paper describes how we have customized the JAWS OO design to leverage advanced features of Windows NT on multi-processor platforms linked by high-speed ATM networks. The Windows NT features used in JAWS include asynchronous mechanisms for connection establishment and data transfer. Our previous benchmarking studies demonstrate that once the overhead of disk I/O is reduced to a negligible constant factor (e.g., via memory caches), the primary determinants of Web server performance are its concurrency and event dispatching strategies. Our performance results over a 155 Mbps ATM network indicate that certain Windows NT asynchronous I/O mechanisms (i.e., TransmitFile) provide superior performance for large file transfers compared with conventional synchronous multi-threaded servers. Conversely, synchronous event dispatching performed better for files less than 50 Kbytes. Thus, to provide optimal performance, a Web server design should be adaptive, i.e., choosing to use different mechanisms (such as TransmitFile) to handle requests for large files, while using alternative I/O mechanisms (such as synchronous event dispatching) on requests for small files.

Journal ArticleDOI
TL;DR: Predictive simulation modeling is focused on that is used for detailed performance analysis of existing systems and may be motivated by "what-if" studies or by a desire to improve system's unsatisfactory performance.
Abstract: Every transaction system supporting business processes must satisfy some computational performance requirements. If it does not, the system may have a significantly reduced value or become useless. The number of factors that influence performance of distributed transaction systems is so large, that the traditional capacity planning methods appropriate for centralized mainframe systems are not applicable. The most cost-effective method has proven to be predictive modeling. A combination of analytic and simulation modeling is most useful for analyzing and tuning performance of systems that have been designed or are in production. The task of designing a system ab initio given expected workloads and required performance/cost constraints is much harder. In this special issue, the paper by Marc Brittan and Janusz Kowalik describes a method for designing initial systems configurations and capacities. This paper focuses on predictive simulation modeling that is used for detailed performance analysis of existing systems. Such analyses may be motivated by "what-if" studies or by a desire to improve system's unsatisfactory performance.

Journal ArticleDOI
TL;DR: This work uses fast analytic approximations, coupled with simulated annealing, to perform the preliminary design search, and uses simulation after a candidate set of well defined systems configurations has been selected for further investigation.
Abstract: The design of large scale distributed systems involves a large number of problems in discrete and continuous optimization. Many of these problems fall into the complexity class of NP-Complete, and beyond. Although simulation has long been a method of choice for evaluating systems performance of well defined systems, it is not practical to perform simulation inside a combinatorial optimization loop. Because of this complexity in the discrete part of the search of the design space, we are forced to use fast analytic approximations, coupled with simulated annealing, to perform the preliminary design search, and use simulation after a candidate set of well defined systems configurations has been selected for further investigation.

Journal ArticleDOI
TL;DR: This work presents SIMA (System for distributed IMage Applications), a tool for contructing distributed applications for the analysis of biomedical images, developed in JAVA and no previous knowledge in distributed programming is required for its proper use.
Abstract: In this work we present SIMA (System for distributed IMage Applications), a tool for contructing distributed applications for the analysis of biomedical images. Such applications involve a set of operations that go from improvement of images to production of results that allow users to diagnose. Using SIMA it is possible to exploit, in an integral way, the available resources in a heterogeneous computer network for image processing . This tool has been developed in JAVA and no previous knowledge in distributed programming is required for its proper use it. A graphic interface allows the definition of applications for image processing. A distributed application is constructed by composition of operations, starting from a set of default basic operations (segmentation, filters, calculation of areas).

Journal ArticleDOI
TL;DR: This paper presents a survey of the parallel processing for the image synthesis by means of ray tracing, and focuses on the different parallelization strategies of Ray tracing.
Abstract: The research on Photo-Realistic rendering intends to develop algorithms in order to render images of synthetic or artificial high quality models. One of the most popular methods that allows to obtain this kind of images is ray tracing. This technique requires the evaluation of a very large number of light contributions in a scene that could be defined by several hundred thousand objects. To this end, the ray tracing algorithm must calculate a large number of ray-object intersections, which makes this algorithm very expensive in computation time. Fortunately, ray tracing is an algorithm intrinsically parallel which offers several potential acceleration techniques to reduce such higher rendering time in generating one single image. This paper presents a survey of the parallel processing for the image synthesis by means of ray tracing. Firstly, we describe the ray tracing algorithm and the different sequential acceleration techniques briefly. This paper focuses on the different parallelization strategies of ray tracing. These strategies are presented from the point of view of their main differences, as well as their advantages and disadvantages. Afterward, we analyze the main interesting and promising research fields on the ray tracing acceleration technique, although these same fields can also help in other rendering algorithms.

Journal ArticleDOI
TL;DR: A hybrid scheduling algorithm is used which brings tasks and data together according to coherence between rays, which removes the worst hot-spots from the data parallel component and reschedules those as demand driven tasks, thereby evening out the workload while keeping communication overheads at bay.
Abstract: Parallelising ray tracing using a data parallel approach allows arbitrarily large models to be rendered, but its inherent load imbalances may lead to severe inefficiencies. To compensate for the uneven load distribution, demand-driven tasks may be split off and scheduled to processors that are less busy. We use a hybrid scheduling algorithm which brings tasks and data together according to coherence between rays. Coherent tasks are suitable for demand driven scheduling and the remainder is executed in data parallel mode. This method removes the worst hot-spots from the data parallel component and reschedules those as demand driven tasks, thereby evening out the workload while keeping communication overheads at bay. Results are presented for scenes of up to 295,000 polygons using a large cluster of Sun workstations. Finally, the hybrid scheduling algorithm is expanded by also sampling diffuse inter-reflection. This puts a significant additional strain on the data parallel component, the implications of which are examined by presenting and discussing relevant results.

Journal ArticleDOI
TL;DR: H hierarchical and distributed clusteral models with dynamic cluster re-sizing and caching which are used in combination with dynamic task and data management strategies to provide an efficient parallel implementation for volume visualization on a large distributed memory multiprocessor system are discussed.
Abstract: Volume visualization is a powerful engineering tool. However, the visualization of a three dimensional volume is computationally expensive taking significant amounts of time to produce the images on conventional computers. Parallel processing offers the possibility of rendering the volume in acceptable times. This paper discusses hierarchical and distributed clusteral models with dynamic cluster re-sizing and caching which are used in combination with dynamic task and data management strategies to provide an efficient parallel implementation for volume visualization on a large distributed memory multiprocessor system.

Journal ArticleDOI
TL;DR: The main purpose is to update the designers and users of parallel numerical algorithms with the latest research in the field and present the novel ideas, results and work in progress and advancing state-of-the-art techniques in the area of parallel and distributed computing for numerical and computational optimization problems in scientific and engineering application.
Abstract: Edited by Tianruo Yang Kluwer Academic Publisher, Dordrech, Netherlands, 1999, 248 pp. ISBN 0-7923-8588-8, $135.00 This book contains a selection of contributed and invited papers presented and the workshop Frontiers of Parallel Numerical Computations and Applications, in the IEEE 7th Symposium on the Frontiers on Massively Parallel Computers (Frontiers '99) at Annapolis, Maryland, February 20-25, 1999. Its main purpose is to update the designers and users of parallel numerical algorithms with the latest research in the field. A broad spectrum of topics on parallel numerical computations, with applications to some of the more challenging engineering problems, is covered. Parallel algorithm designers and engineers who use extensively parallel numerical computations, as well as graduate students in Computer Science, Scientific Computing, various engineering fields and applied mathematics should benefit from reading it. The first part is addressed to a larger audience and presents papers on parallel numerical algorithms. Two new libraries are presented: PSPASSES and PoLAPACK. PSPASSES is a collection of parallel direct solvers, for sparse symmetric positive definite linear systems, which are characterized by high performance and good scalability. PoLAPACK library contains LU and QR codes based on a new blocking strategy that guarantees good performance regardless of the physical block size. Next, an efficient approach to solving stiff ordinary differential equations by diagonal implicitly iterated Runge-Kutta (DIIRK) method is described. DIIRK renders a fast parallel implementation due to a reduced number of function evaluation and an automatic stepsize control mechanism. Finally, minimization of sufficiently smooth non-linear functionals is sought via parallel space decomposition. Here, a theoretical background of the problem and two equivalent algorithms are presented. New research directions for classical solvers are treated in the next three papers: first, reduction of the global synchronization in the biconjugate gradient method, second, a new more efficient Jacobi ordering for the multiple-port hypercubes, and finally, an analysis of the theoretical performance of an improved version of the Quasi-minimal residual method. Parallel numerical applications constitute the second part of the book, with results from fluid mechanics, material sciences, applications to signal and image processing, dynamic systems, semiconductor technology and electronic circuits and systems design. With one exception, the authors expose in detail parallel implementations of the algorithms and numerical results. First, a 3D-elasticity problem is solved using an additive overlapping domain decomposition algorithm. Second, an overlapping mesh technique is used in a parallel solver for the compressible flow problem. Then, a parallel version of a complex numerical algorithm to solve a lubrication problem studied in tribology is introduced. Next, a timid approach to parallel computing of the cavity flow by the finite element method is presented. The problem solved is rather small for today's needs and only up to 6 processors are used. This is also the only paper that does not present results from numerical experiments. The remaining applications discussed in the subsequent chapters are: large scale multidisciplinary design optimization problem with application to the design of a supersonic commercial aircraft, a report on progress in parallel solution of an electromagnetic scattering problem using boundary integral methods and an optimal solution to the convection-diffusion equation modeling the concentration of a pollutant in the air. The book is of definite interest to readers who keep up-to-date with the parallel numerical computation research. The main purpose, to present the novel ideas, results and work in progress and advancing state-of-the-art techniques in the area of parallel and distributed computing for numerical and computational optimization problems in scientific and engineering application is clearly achieved. However, due to its content it cannot serve as a textbook for a computer science or engineering class. Overall, is a reference type book to be kept by specialists and in a library rather than a book to be purchased for self-introduction to the field. Most of the papers presented are results of ongoing research and so they rely heavily on previous results. On the other hand, with only one exception, the results presented in the papers are a great source of information for the researchers currently involved in the field. Michelle Pal, Los Alamos National Laboratory


Journal ArticleDOI
TL;DR: The syntax and semantics of the language features, and its implementation issues, such as the reduction of message communication cost, efficient implementation of statically and dynamically created massive objects, the realization of synchronization schemes, and the object-to-node allocation scheme to minimize communication cost are presented.
Abstract: A-NETL is a parallel object-oriented language intended to be used for managing small to massive parallelism with medium grain size. Its design goals are to treat data parallel operations at the same cost as programming languages of the SIMD type, to support various styles of message passing among objects, and to provide several synchronization facilities for realizing autonomous control of objects. Starting with these design principles, we present the syntax and semantics of the language features, and its implementation issues, such as the reduction of message communication cost, efficient implementation of statically and dynamically created massive objects, the realization of synchronization schemes, and the object-to-node allocation scheme to minimize communication cost. We present performance results from the language's implementation on an A-NETL oriented multicomputer, on the AP1000 using the AP1000's message passing library and on a cluster of workstations using the PVM library.

Journal ArticleDOI
TL;DR: This paper presents an evaluation of a multicriteria placement manager that is implemented in a CORBA compliant system and optimized the execution of distributed object-oriented applications by positioning the components in a manner to optimize resource use.
Abstract: The aim of our study is to optimize the execution of distributed object-oriented applications by positioning the components in a manner to optimize resource use. Placement of distributed applications is an open issue. In the context of distributed multiuser environments, where an application can not assume the behavior of the others, placement must be directed by the system to achieve this aim. Indeed, the system can dynamically observe behaviors of the applications and propose a placement. Placement decisions must be driven from two main factors, namely the computer and the network loads. In this paper we present an evaluation of a multicriteria placement manager that we implemented in a CORBA compliant system.

Journal ArticleDOI
TL;DR: A Scalable Portability Model (SPM) is called for, that uses several novel techniques to meet the goals of performance, scalability and portability for an important set of applications.
Abstract: High Performance Computing creates new demands on software applications with respect to performance, scalability and portability. The increased complexity of parallel machine architectures, on the one hand, and different parallel programming paradigms on the other, poses serious challenges to software developers. Constructing a high-performance program requires detailed knowledge of the computer`s architectural features. This knowledge constitutes a detailed, albeit informal, model of computation against which the performance program is written. Similar characteristics must be considered in building a portable high-performance program where the appropriate details are elusive and often unavailable when the program is written. In order to support this type of programming, we call for a Scalable Portability Model (SPM), that uses several novel techniques to meet these challenges. A portable high-performance program must be capable of adapting to the particular environment in which it is running. We call the technique for achieving this adaptation Two-Phase Adaptation. Firstly, an automatic analysis and exploration of the underlying architecture environment is carried out. Secondly, an efficient matching between the application complexity and the environment complexity is completed. We present some of the techniques used, and provide evidence that SPM has reached the goals of performance, scalability and portability for an important set of applications.

Journal ArticleDOI
TL;DR: The important current trend of using realistic models to design efficient and scalable parallel algorithms is discussed, including Valiant's BSP (Bulk Synchronous Parallel) computing model, proposed in 1990, which is concerned with practical issues of parallel computing.
Abstract: In this editorial we discuss the important current trend of using realistic models to design efficient and scalable parallel algorithms. Previous issues of Parallel and Distributed Computing Practices ( PDCP ) have emphasized the ubiquitous presence of parallel computing. In the first issue of PDCP , it is mentioned that parallel computing is invading the world of everyday computing through multiprocessor desktop systems. A recent issue of PDCP addresses the new-coming and increasingly more popular cluster computing, stating that over the decade, clusters will span the entire range of high-performance computing platforms. Indeed parallel computing has become the mainstream of high-performance computing. If we examine the list of the TOP500 Supercomputer Sites, which contains the five hundred most powerful systems installed, we can verify that all the 500 of the TOP500 list are parallel computers of some kind, ranging from 6 to 9,632 processors. Of these 500, nearly 90% have 64 processors or more. Parallel and Distributed Computing Practices , as indicated by its name, is concerned with practical issues of parallel computing, addressing the consequences of the trends in such areas as performance and applications. On the one hand, it is indisputable the rapid advances in new architectural designs of parallel computers. On the other, it is still far from clear where we stand as far as the design of really efficient and scalable parallel algorithms is concerned. In his recent editorial in a special issue on coarse-grained parallel algorithms of Algorithmica , Dehne has addressed this problem to some depth. It is our intent to contribute in this discussion. Since the eighties, the PRAM (Parallel Random Access Machine) model has been receiving considerable attention. It is a formal model that allows one to establish optimal results. Its importance also relies on the possibility to relate parallel complexity to sequential complexity defined on traditional sequential computing models. By removing algorithmic details as communication and synchronization, the PRAM model allows one to focus on the structural characteristics in the problem domain. Furthermore, many of the techniques and methods designed for the PRAM model can be extended to other computing models. One should notice, however, that PRAM algorithms, when implemented in practice, leave much to be desired in terms of actual performance. Frequently speedup results for theoretical PRAM algorithms do not match the actual speedups obtained in experiments performed on real parallel computers. So in spite of the usefulness, as far theory is concerned, of the PRAM model, we are desperately in need of more realistic parallel computing models. Among the realistic computing models, the most important is probably Valiant's BSP (Bulk Synchronous Parallel) computing model, proposed in 1990. A BSP computer consists of a set of processor/memory modules connected by a router that can deliver messages in a point to point fashion among the processors. In the BSP model, computation is divided into a sequence of supersteps separated by barrier synchronizations. A superstep in turn consists of local computation and data exchange among processors through the router. Though BSP is possible to simulate PRAM algorithm optimally on distributed memory machines, Valiant observes the importance of design of parallel algorithms that take advantage of local computations and minimize global operations. Valiant also points out situations in which PRAM simulations are not efficient and these situations, unfortunately, occur in the majority of current parallel computers. Dehne et al. proposed a simpler and more practical version of the BSP model in 1993, referred to as the Coarse-Grained Multicomputer (CGM) model. Considering n the problem size, the CGM ( n,p ) model consists of p processors P 1 , …, P p , with O(n/p) local memory per processor and connected through an arbitrary interconnection network. The term coarse-grained means the local memory size is large, usually we require n/p > p . An algorithm in the CGM model consists of alternating local computation and global communication rounds. In a computation round the p processors compute independently on their respective local data and the best possible sequential algorithm can be used in each processor for this local computation. In a communication round each processor may send O(n/p) data and receive O(n/p) data. It is required that all information sent from a given processor to another processor in one communication round be packed into one long message, thereby minimizing the message overhead. A CGM computation/communication round corresponds to a BSP superstep. The CGM model is particularly suitable in cases where the overall computation speed is considerably larger than the overall communication speed, and the problem size is considerably larger than the number of processors, which is usually the case in practice. The main advantage of the CGM model is its simplicity. It models the communication cost of a parallel algorithm by using only one single parameter, namely the number of communication rounds. Nevertheless, it gives a realistic performance prediction for commercially available multiprocessors. The goal of the CGM model is to minimize the number of communication rounds as well as the total local computation time. Both in BSP and CGM algorithms, it has been shown that minimizing the number of communication rounds leads to improved portability across different parallel architectures. The CGM model allows the exchange of O(n/p) data in each communication round. This is of course an upper bound limit. In a practical point of view, it is desirable to have an amount of data transmitted that is independent of n in a communication round. It is desirable that this amount of data transmitted in a communication round be constant or independent of n , say O(p) . The appearance of these more realistic models such as BSP, CGM and others has somehow pushed the advances of design of efficient and scalable parallel algorithms in the nineties to a new level. By examining the proceedings and journals in this area, one can verify the current state in the design of parallel algorithms in such models, many of which actually implemented and shown to give significant performance results. One perceives, however, that such advances are still modest as compared to the advances in hardware and computer architecture design. The challenge in the next decade, at the dawn of the new millenium, is for researchers in algorithms design to close the gap between hardware and software in parallel computing. S. W. Song Instituto de Matematica e Estatistica Universidade de Sao Paulo

Journal ArticleDOI
TL;DR: How Repo-3D simplifies exploratory programming of distributed 3D graphical applications, making it easy for programmers to rapidly evolve prototypes using a familiar multi-threaded, object-oriented programming paradigm.
Abstract: Repo-3D is a general-purpose, object-oriented library for developing distributed, interactive 3D graphics applications across a range of heterogeneous workstations. In this paper we discuss how Repo-3D simplifies exploratory programming of distributed 3D graphical applications, making it easy for programmers to rapidly evolve prototypes using a familiar multi-threaded, object-oriented programming paradigm. All data sharing of both graphical and non-graphical data is done via general-purpose distributed objects, presenting the illusion of a single distributed shared memory. Repo-3D is embedded in Repo, an interpreted, lexically-scoped, distributed programming language, allowing entire applications to be rapidly prototyped. We discuss Repo-3Di?½s design and how it supports exploratory distributed programming, present a number of illustrative examples, and discuss the pros and cons of this model for other programming tasks.