scispace - formally typeset
Search or ask a question

Showing papers in "Scalable Computing: Practice and Experience in 1999"


Journal ArticleDOI
TL;DR: This text is an in depth introduction to the concepts of Parallel Computing designed for use in university level computer science courses and provides the current theories that are in use in industry today.
Abstract: Kai Hwang and Zhiwei Xu McGraw-Hill, Boston, 1998, 802 pp. ISBN 0-07-031798-4, $97.30 This text is an in depth introduction to the concepts of Parallel Computing. Designed for use in university level computer science courses, the text covers scalable architecture and parallel programming of symmetric muli-processors, clusters of workstations, massively parallel processors, and Internet-based metacomputing platforms. Hwang and Xu give an excellent overview in these topics while keeping the text easily comprehensible. The text is organized into four parts. Part I covers scalability and clustering. Part II deals with the technology used to construct a parallel system. Part III pertains to the architecture of scalable systems. Finally, Part IV presents methods of parallel programming on various platforms and languages. The first chapter presents different models on scalability as divided into resources, applications, and technology. It defines three abstract models (PRAM, BSP, and phase parallel models) and five physical models (PVP, SMP, MPP, COW, and MPP systems). Chapter 2 introduces the ideas behind parallel programming including processes, tasks, threads and environments. Chapter 3 introduces performance issues and metrics. As an introduction to Part II, Chapter 4 introduces the history of microprocessor types and their applications in the architectures of current systems. Chapter 5 deals with the issues of distributed memory. It discusses several models such as UMA, NORMA, CC-NUMA, COMA, and DSM. Chapter 6 presents gigabit networks, switched interconnects, and other various high-speed networking architectures to construct clusters. Chapter 7 discusses the overheads created by parallel computing, such as threads, synchronization, and efficient communication between nodes. Part III, Chapter 8, 9 and 11, give comparisons between various types of scalable systems (SMP, CC-NUMA, Clusters, and MPP). The comparisons are based on hardware architecture, the system software, and special features that make each system unique. Chapter 10 compares various research and commercial clusters with an in depth study of the Berkeley NOW, IBM SP2, and Digital TruCluster systems. Chapter 12 introduces the concepts of Part IV with details into parallel programming paradigms. Chapter 13 discusses communications between the processors using message passing programming (such as MPI and PVM libraries). Chapter 14 studies the data parallel approach with an emphasis in Fortran 90 and HPF. With attention to detail through examples, Hwang and Xu have created a well-written introduction to Parallel Computing. The authors are distinguished for their contributions in this field. This text is based on cutting-edge research, providing the current theories that are in use in industry today. Bin Cong, Shawn Morrison and Michael Yorg, Department of Computer Science California Polytechnic State University at San Luis Obispo

208 citations


Journal ArticleDOI
TL;DR: This book classifies the entities that can be replicated on a distributed computing system into the following categories: data, process, objects, and (d) messages, and uses individual chapters to describe major methods for each of the above replication categories.
Abstract: i»? A. A. Helal, A. A. Heddaya, and B. B. Bhargrava Kluwer Academic Publishers, Boston, 1996, 176 pp. ISBN 0-7923-9800-9, $118.50 Computing in the 1990s has reached the state of distributed computing. A basis of this form of computing is a distributed computing system, which is built on the following three components: (a) personal computers, (b) local and fast wide area networks, and (c) system and application software. By amalgamating computers and networks into one single computing system and providing an appropriate system software, a distributed computing system has created the possibility of sharing information and peripheral resources. Furthermore, these systems improved performance of a computing system and individual users through parallel execution of programs, load balancing and sharing, and replication of programs and data. Distributed computing systems are also characterized by enhanced availability, and increased reliability. Replication is the key to providing high availability, fault tolerance, and enhanced performance in a distributed computing system. As companies move toward systems that are more open and distributed, replication is becoming increasingly important in the ability to provide data and services that are current, correct and available, which is a key factor in maintaining a competitive advantage over rivals. However, replication has also generated some serious challenges. For example, when a replica is updated, how do we propagate such an update to other replicas? If multiple replicas are updated simultaneously and some of the updates conflict one another, how do we resolve the conflict? How do we deal with the situation where a replica is down and then subsequently recovered? How do we deal with network partitions? Not surprisingly, considerable research efforts have been directed towards the solution of these challenges. However, most of the research results are scattered among many journals, conference proceedings, theses, and technical reports. The book by Helal, Heddaya and Bhargava has gathered the best material of these research results and formed a coherent collection that includes definitions, theoretical background, algorithms, annotated bibliographies of commercial and experimental prototype systems, and more than 200 references. This book classifies the entities that can be replicated on a distributed computing system into the following categories: (a) data, (b) process, (c) objects, and (d) messages. After the introduction on the goals and main approaches of replication techniques, the book uses individual chapters to describe major methods for each of the above replication categories. The main focus of this part, however, is on the replication of data. The book also contains two chapters on the replication issues in heterogeneous, mobile, and large-scale systems, and on the future of replication techniques. Apart from the normal chapters, the book contains a rich set of appendices that are very useful for further studies of replication techniques. These appendices include brief descriptions of two dozen commercial and experimental systems that use replication techniques, annotated bibliographies of selected literature on various topics of replication, and an introduction on serializability theory. Each annotated bibliography contains an overview of the specific topic and is written by an invited expert in that field. The combination of the normal chapters and appendices cover the entire spectrum of replication, from definitions, concepts, theories, algorithms, techniques, to systems. The book serves as an excellent introduction and roadmap to the issues of replication. Although it does not provide a detailed description of every replication techniques and systems, the book covers most of the fundamental work that allows the reader to understand the roots of these techniques and systems, and to find the ways to search for the details. This book is a valuable reference book for anyone studying replication techniques or implementing replication systems, specifically, it can be very useful to post-graduate students, practitioners, and beginning researchers of this exciting field. Wanlei Zhou, School of Computing and Mathematics, Deakin University

52 citations


Journal ArticleDOI
TL;DR: In Search of Clusters; The ongoing battle in lowly parallel computing is a comprehensive and well-organized exposition of an important branch of parallel computing, computer clusters.
Abstract: Gregory F. Pflster Prentice Hall PTR, Upper Saddle River, NJ, 1998, 608 pp. ISBN 0-13-899709-8, $35.96 In Search of Clusters; The ongoing battle in lowly parallel computing is a comprehensive and well-organized exposition of an important branch of parallel computing, computer clusters. The underlying theme of this book can best be summarized by the author's definition of cluster : a type of parallel or distributed system that consists of a collection of interconnected whole computers, and is used as a single, unified computing resource . As the author sought, this general definition allowed for his coverage to be descriptively inclusive yet not prescriptive; he continually acknowledges there are many ways to build a cluster. For the sake of comparison, he dedicates a great deal of the book to the coverage of symmetric multiprocessors (SMP's) and Non-Uniform Memory Access machines (NUMA's), key competitors to clusters. In general, the content of the book is very accessible where the author weaves elaborate prose with interjections of casual colloquialisms and bits of levity. His tone is truly ingratiating, and undoubtedly has the reader chuckling and smiling. The book can be used in upper-level undergraduate and lower-level graduate courses on parallelism, as well as by professionals in industry seeking information about this powerful new manifestation of parallelism. The author systematically covers the subject of clusters. In Part One, he motivates and defines by explaining what clusters are, why they are needed, and why they are needed now. This material does not depend on prior knowledge and can be used for students new to parallel computing. In Part Two, the hardware of clusters is covered. Here, he describes, compares, and contrasts the main hardware genres encompassed in clusters, SMP's, and NUMA's. In Part Three, the same treatment is given to software including programming models, serialization, and specific parallel languages. In Part IV, he combines the material of the previous parts into a compelling presentation of what he states is the definitive characteristic of the cluster genre, the alliance between hardware and software. In this final part, he discusses overall systems and how they attain the many goals of any parallel architecture such as performance, scaleability, price/performance, and availability of nodes. He also includes a wonderful list of justifications for the one attribute that clusters offer that other parallel options do not, that of scavenging unused CPU cycles. Overall, the book is quite comprehensive and well-organized with many good analogies and rich metaphors. In very few instances, however, the metaphors may get a bit reckless and resort to a (very small amount of) gender-biased language in descriptions. There are no end-of-chapter exercises nor are keypoints highlighted. On the other hand, there is a plethora of creative examples and illustrations to elucidate complex ideas that significantly aid the reader. In summary, the author's mastery of language makes reading this book a pleasure. The coverage is complete and thorough, and logically ordered. The book is an excellent and accessible resource and would prove to be a pedagogical aid to any computer enthusiast interested in parallel computing. Jennifer Seitzer, University of Dayton

33 citations


Journal ArticleDOI
TL;DR: Rajkumar Buyya's edited volumes have arrived at the right time to provide the right kind of teaching material for the type of course, which I have above mentioned.
Abstract: Edited by: Rajkumar Buyya Prentice Hall PTR, Upper Saddle River, NJ, 1999 Volume 1: 881pp, ISBN 0-13-013784-7, $54.00 Volume 2: 700pp, ISBN 0-13-013786-5, $54.00 I used the two books as part of a course that I taught at IIT Delhi during July-Dec 1999. The course was called Special Topics in Computers and included Masters students from the Departments of Electrical Engineering and Computer Science. The idea behind the course was to introduce elements of Distributed and Clustered Computing to students who already have some background in Computer Architecture. In my opinion, a single course in Computer Architecture at the Masters level cannot do justice to the developments that have taken place in this exciting field over the past decades. I have been teaching Computer Architecture at IIT Delhi for about 8 years now and find that the number of topics that I have to include in the course has been increasing very quickly over the years! In 1992, when I first taught the course, I used to teach about Pipelined Computers, Systolic Arrays, SIMD array processors, and Multiprocessors. As time has passed, it has become necessary to include topics such as high-performance interconnection networks, wormhole routing, cache coherence protocols, instruction pipelines, superscalar processors, and instruction-level parallelism. More recently, I have realized that it is also necessary to teach elements of distributed computing and clustered computing. I believe that a course on selected topics from distributed and clustered computing is essential to all branches of Engineering today. High-performance computing is essential to all branches of Engineering, and high-performance clustered computing on clusters of workstations/PCs makes a lot of economical sense. A course such as this is interesting because it is easy for the students to practice what is taught in such a class without too many infrastructures. This makes the course very exciting to students. When I taught parallel algorithms in my class, I could not ask my students to develop these algorithms for lack of good infrastructure. However, a cluster of workstations and PCs is available in almost every University today. Distributed computing is highly popular with students. Students are eager to learn Java and network programming. Rajkumar Buyya's edited volumes have arrived at the right time to provide the right kind of teaching material for the type of course, which I have above mentioned. The first volume covers architectural and system-level issues of clustered computing systems. It has 36 chapters organized into four sections, spanning 811 pages. Section I develops the motivation for high-performance clustered computing. Section II introduces various networking protocols and I/O mechanisms that are useful in clustered computing. Section III covers OS issues such as process scheduling and load balancing. Section IV includes a number of case studies of existing systems. The editor has made a considerable effort in gathering learning material for the volume. I think that this volume could be an excellent text for an advanced course on Computer Architecture. In the course that I taught, I used parts of Section I and IV. To me, the second volume proved more useful. The second volume concerns with applications of clustered computing and applications programming. It is divided into 3 sections. There are 29 chapters that span 604 pages. Both volumes have carefully prepared Glossaries and Indexes. The first section of the second volume is on various programming environments and development tools. Although I limited myself to the discussion of the Parallel Virtual Machine (PVM) and Message Passing Interface (MPI), the editor has also included entire chapters on linking of these two environments, active objects, component-based approach, LiPS, and WebOS. There is a chapter of debugging parallelized code that can be highly valuable for students and developers of applications. The second section is on Java for high-performance computing. These topics raised a lot of interest among students when I taught the course. The third section is on specific algorithms and applications such as parallel genetic algorithms, parallel simulation, distributed database systems, graphics and image processing. The authors who have contributed to the two volumes are all known experts in the areas of Computer Architecture, Computer Networks, Distributed Computing, and Operating Systems. The authors and the editor must be congratulated on this extraordinary effort to compile so much useful material under one place. The books can not only serve as reference material to professional programmers in the modern IT industry, it can also serve as excellent teaching material for courses related to Computer Architecture, Distributed Computing, and Operating Systems. The web page that the editor has created for the book is also a wonderful repository of learning material available on the web. I recommend these volumes wholeheartedly to all serious researchers and students interested in the areas of high-performance computing in general and clustered computing in particular. C. P. Ravikumar, IIT Delhi

12 citations


Journal ArticleDOI
TL;DR: This book addresses a variety of algorithm design issues by either comprehensive survey or in-depth discussions for structured matrices, and proposes several fast algorithms for the solution of such linear systems by considering the special properties of the structure and the numerical stability constraints.
Abstract: Edited by: Peter Arbenz, Marcin Paprzycki, Ahmed Sameh and Vivek Sarin Nova Science Publishers, Commack, NY, 1998, 197 pp. ISBN 1-56072-594-X, $79.95 The book is a well-organized comprehensive collection of algorithms and ideas in the area of high performance solutions for structured linear systems, eigenvalue and singular value problems. The content of this text is impressive, not only summarizing the state of the art in the area, but also discussing many detailed implementation issues with supporting examples on numerous parallel and distributed architectures from vector computers, shared and distributed memory multiprocessors to cluster of workstations. Hence, this book is an excellent reference for the working professionals in scientific and engineering computing with application areas. The book's targets can be mainly classified into four categories, namely linear system solver, eigenvalue problems, matrices with special structure and parallel computation. In the first part, linear system solver, direct solution techniques have been discussed mainly in term of performance and stability for large and sparse linear systems with a certain structure inherent in the problems. The effectiveness of different approaches is demonstrated by extensive experiments carried out on different architectures from vector computer, shared and distributed memory parallel computers to workstations. In the second part, eigenvalue problems, several approaches are highlighted for computing eigenvalues and singular values of structured matrices. They are presented by, either reducing banded matrices to bidiagonal and tridiagonal form by numerically stable and efficient orthogonal transformations, or improving bisection and divide-and-conquer algorithms, on parallel architectures. The theoretical considerations are verified by experimental results on vector and distributed memory computers. In the third part, matrices with special structures, some special structured matrices such as Hankel, Toeplitz systems from many applications, for example, signal processing, are treated. The authors propose several fast algorithms for the solution of such linear systems by considering the special properties of the structure and the numerical stability constraints. In the last part, parallel computation, the issues related to the question how to efficiently implement the existing algorithms for structured matrices are covered; ranging from load balance to optimized implementation. In summary, this book addresses a variety of algorithm design issues by either comprehensive survey or in-depth discussions for structured matrices. The only small disappointment is that the discussions on iterative methods for parallel and distributed architectures should be touched as well. Overall, this text would be a very valuable reference to professionals or to students working on structured matrix related problems. Laurence Tianruo Yang St. Francis Xavier University

7 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present the design, implementation and evaluation of a new distributed shared memory (DSM) coherence model called multiple-writer entry consistency (MEC), which combines the efficient communication mechanisms of Lazy Release Consistency (LRC) with the flexible data management of the Shared Regions (SR) models.
Abstract: In this paper, we present the design, implementation and evaluation of a new distributed shared memory (DSM) coherence model called multiple-writer entry consistency (MEC). MEC combines the efficient communication mechanisms of Lazy Release Consistency (LRC) with the flexible data management of the Shared Regions and Entry Consistency (EC) models. This is achieved in MEC by decoupling synchronization from coherence (in contrast to the tight coupling of synchronization and coherence present in EC) while retaining the familiar synchronization structure found in Release Consistent (RC) programs. The advantage of MEC is that it allows region-based coherence protocols (those that manage data at the granularity of user-defined shared regions) to be used along side page-based protocols within an application and within the RC framework. Our experimental evaluation on an 8 processor system shows that using MEC reduces parallel execution times by margins ranging from 5% to 46% in five of the six applications that we study. However, the parallel execution time of the LRC version of the remaining application is lower than the MEC version by 48%. We conclude that offering both page-based and region-based models for coherence within the same system is not only practical but necessary.

4 citations


Journal ArticleDOI
TL;DR: This special issue of SCPE suggests that there are a lot of activities on the way indeed and there is often a need to share the computing infrastructure between virtual supercomputers and the more mundane everyday use of the `borrowed' workstations.
Abstract: The history of computing has always also been a history of hunting for higher performance. Until recently it was accepted that to go beyond the current sweet spot in terms of performance per money of available machines, it was required to invest. In fact, to invest enormously. While dedicated high end machines still have their place, and probably always will, the construction of special machines close to the top and that of clusters of less dramatic machines co-evolved to something quite comparable. As a result, engineering sparks could bridge the gab and we now find high-performance system area and even local area networks that derive from the backbone communication infrastructure of massively parallel supercomputers. Being faced with the possibility to assemble ones own supercomputer out of off-the-shelf components, such as high-end PCs, changed the performance scene markedly. The hardware substrate for modest to impressive supercomputing equipment is now suddenly ubiquitously available. As a result, not surprisingly, the parallel computing are attracted a boost of attention. Also not surprisingly, there is a desire to reduce the traditional reliance on a select few who intimately understood the inner workings of the latest generation supercomputers to really make them fly. Finally, there is often a need to share the computing infrastructure between virtual supercomputers and the more mundane everyday use of the `borrowed' workstations. Methodology and software infrastructure need to be almost reinvented to properly address all the new challenges - but without losing what has been learnt in the past. This special issue of SCPE suggests that there are a lot of activities on the way indeed. As a result, the editors had great difficulty selecting papers out of the overwhelmingly large number of submissions. The selected articles both demonstrate diversity and quality of this still relatively young field. We would like to express our deep gratitude to Prof. Marcin Paprzycki, Editor-in-chief of SCPE for giving us an opportunity to Guest-Edit this special issue. This issue would not be possible without the help of referees who worked very hard to review all the submitted papers. We would like to thank them all: Alessandro Bevilacqua Alfred Weaver Amin Vahdat Amitabh Dave Amy Apon Andrzej Goscinski Benedict Gomes Boleslaw Szymanski Boris Weissman Cho-Li Wang Chung-Ta King Dan Hyde David Bader Davide Rossetti Domenico Talia Dorina Petriu Dror Feitelson El-ghazali Talbi Erhan Saridogan Ernst Biersack Gangan Agrawal Gihwan Cho Giuseppe Ciaccio Harjinder Sandhu Hye-Seon Meeng Jamel Gafsi Jay Fenwick Jianyong Wang John Dougherty Kennith Birman Lars Rzymianowicz Lori Pollock Luis Silva Maciej Golebiewski Mark Baker Mark Clement Marrianne Winslett Mathew Chidester Michele Colajanni Orly Kremien Paddy Nixon Paul Roe Putchong Uthayopas Quinn Snell Rainer Fraedrich Rajeev Raje Rajeev Thakur Ricky Kwok Robert Brunner Robert Todd Samuel Russ Toni Cortes Yong Cho Yoshio Tanaka Yu-Kwong Kwok We hope you will find this special issue interesting. Rajkumar Buyya Monash University Clemens Szyperski Queensland University of Technology

4 citations



Journal ArticleDOI
TL;DR: DOGMA is presented, a Java based system which simplifies parallel computing on heterogeneous computers and provides a unified environment for developing high performance parallel applications on heterogeneity systems.
Abstract: Heterogeneous distributed computing has traditionally been a problematic undertaking which increases in complexity as heterogeneity increases. This paper presents results obtained using DOGMA--a Java based system which simplifies parallel computing on heterogeneous computers. The performance of Java just-in-time compilers currently approaches C++ for many applications, making Java a serious contender for parallel application development. DOGMA provides support for dedicated clusters as well as idle workstations through the use of a web based browse-in feature or the DOGMA screen saver. DOGMA supports parallel programming in both a traditional message passing form and a novel object-oriented approach. This research provides a unified environment for developing high performance parallel applications on heterogeneous systems.

3 citations


Journal ArticleDOI
TL;DR: The book provides the fundamental concepts necessary for the traffic management, design, use, performance issues and implementation of ATM networks and compares different modeling techniques to evaluate the performance of computer communication networks to direct measurement techniques.
Abstract: Mobsen Guizani and Ammar Rayes McGraw-Hill Series on Computer Communications McGraw-Hill, New York, NY, 1999, 224 pp. ISBN 0-07-025217-3, $59.95 Asynchronous Transfer Mode (ATM) has emerged in the 1980s as a promising transfer mode for high speed computer networking, particularly for Broadband Integrated Services Digital Network (B-ISDN) systems. The motivations for this new emerging technology arise from the promising advantages provided by ATM such as statistical multiplexing, high speed switching capabilities, inter-operability, portability, reliability and its support for conventional, real time and non-real time multimedia and Internet applications. The composition of traffic carried by ATM networks is a mixture of low speed voice and data, along with high speed video, image, and interactive traffic. The development of efficient ATM traffic management and network (switching and transmission) design methods rely heavily on understanding of Quality of Service requirements of various services and the traffic characteristics. Managing the service quality in the ATM networks is a complex and essential task for network engineers and service provider companies. The development of appropriate traffic and capacity management models and methods depends crucially on a clear understanding of Quality of Service requirements, statistical characteristics of the traffic, and performance evaluation methods. While there is over one hundred years of experience in voice networks, little is known about the networks that carry a mixture of voice, data, video, and interactive traffic. Due to the interactions among these traffic streams, routing involves complex decision rules that are tightly intertwined; it is critical to understand such interactions, and thereby decouple the decision rules in order to crystallize the essence of routing polices being formulated. Due to the importance of this subject for both industry and research areas, the need for a book that covers these subjects in details arises. Throughout the book entitled Designing ATM Switching Networks, major concepts are first explained in a simple non-mathematical way. This followed by careful descriptions of the modeling issues and then mathematical analysis. The analysis to be gained are explained and examples are given to clarify the more subtle issues. The book stresses the fundamentals of ATM operations, switch architecture and functions, protocol modeling, Quality of Service requirements, traffic modeling and control, fault tolerant, traffic and capacity management functions, and routing. The book provides the fundamental concepts necessary for the traffic management, design, use, performance issues and implementation of ATM networks. ATM routing algorithms are introduced and then analyzed in the book. Moreover, the authors present algorithms to evaluate the performance of the routing schemes presented and compares them against simulation results. Since optical networks are a potential candidates for future telecommunication infrastructure, an overview of optical ATM networks is presented in the book. it introduces the advantages of optical technology and the architecture of ATM optical switches designed by different scientists. Recent projects developed in this area are also presented. The book compares different modeling techniques to evaluate the performance of computer communication networks to direct measurement techniques. Then, introduces analytical modeling and queuing networks. Alter that a review of the discrete time arrival processes is presented. Finally, different types of stochastic processes and the analysis of a single MINI/I queue, MID/i, and network of queues are summarized. In addition, the book covers a review of the main advantages and disadvantages for the use of simulation to perform the analysis of computer communication networks is done. Some of the commercial software packages that are used to perform such simulation studies and the main features that are supported by each package are reviewed. Finally, the book discusses most of the required traffic measurements and the transport performance objectives for broadband switching systems as specified in Bellcore Generic Requirements. This book is meant to be used as a reference for systems designers, hardware and software engineers, R&D managers, and market planners who seek an understanding of local and wide area broadband networks. The first part of the book (Chapters 1, 2, 3, 4, 5, 6, and 10) can be used for an undergraduate senior course. The second part (Chapters 3, 4, 5, 6, 7, 8, 9, and 10) can be used for a graduate course with emphasis on research topics in the field. There are many research ideas open problems presented in chapters 4, 5, 6, 7, and 8. Problems at the end of each are not available at this point, but authors are producing a set of problems that can be supplied to whoever will be using the book for teaching a course. Overall, I found the book extremely useful both in its fundamental and practical treatment of ATM networks. Such a dual purpose is lacking from current books on the topic. As a result, I strongly urge researchers working on this area and students wanting to know more about this topic to take a look at this book. It will be all the help that they can get. Mounir Hamdi Hong Kong University of Science and Technology, China

2 citations


Journal ArticleDOI
TL;DR: The most important design issues and tradeoffs related to the functionality and performance of MSAs are discussed: the communication model, DMA versus programmed I/O transfers, data copying and protection, message pipelining, message arrival notification, and reliability.
Abstract: Several messaging software architectures (MSAs) have been proposed that entirely remove the operating system from the critical communication path, providing direct user-level access to the network interface and avoiding excessive data copying. In this paper we discuss the most important design issues and tradeoffs related to the functionality and performance of MSAs: the communication model, DMA versus programmed I/O transfers, data copying and protection, message pipelining, message arrival notification, and reliability. In order to illustrate how these issues and tradeoffs are tackled in modern systems, we survey a large number of recently proposed MSAs for the Myrinet interconnection network, including AM, FM, U-Net, VMMC, BIP, PM, and Trapeze.

Journal ArticleDOI
TL;DR: The modification of PVM (Parallel Virtual Machine) is described to enable interoperation with WIN32 (Windows NT 4/5 and Windows 95/98).
Abstract: The increasing number of clusters built by commercial, off the shelf hardware shows a new trend in scientific computing. Within this movement is the propagation towards Windows NT as an operating system on PCs. The UNIX environment and Windows NT differ in terms of administrative issues as well as programming techniques. In this paper, we describe the modification of PVM (Parallel Virtual Machine) to enable interoperation with WIN32 (Windows NT 4/5 and Windows 95/98). PVM provides the functionality to harness the computing power of an WIN32 cluster environment. Our migration from UNIX to the WIN32 architecture not only shows where porting existing software is easy and where more generic modules have to be designed, but also the limitations.

Journal ArticleDOI
TL;DR: A portable, high performance thread migration system, implemented with user level threads and active messages, that runs on a cluster of SMPs and the performance obtained is orders of magnitude better than other reported measurements.
Abstract: Thread migration is established as a mechanism for achieving dynamic load sharing and data locality. However, migration has not been used with fine-grained parallelism on workstation clusters due to the relatively high overheads associated with thread and messaging packages. This paper describes a portable, high performance thread migration system, implemented with user level threads and active messages. The thread system supports an extensible event mechanism which permits an efficient interface between the thread and active message system. Migration is supported by user level primitives; applications may implement different migration policies on top of the migration interface provided. The system runs on a cluster of SMPs and the performance obtained is orders of magnitude better than other reported measurements.

Journal ArticleDOI
TL;DR: In this paper, the authors describe a software-based approach into using the network memory in a workstation cluster as a layer of Non-Volatile memory (NVRAM), which transforms a set of volatile main memories residing in independent workstations into a fault-tolerant memory.
Abstract: File systems and databases usually make several synchronous disk write accesses in order to make sure that the disk always has a consistent view of their data, and that data can be recovered in the case of a system crash. Since synchronous disk operations are slow, some systems choose to employ asynchronous disk write operations, at the cost of low reliability: in case of a system crash all data that have not yet been written to disk are lost. In this paper we describe a software-based approach into using the network memory in a workstation cluster as a layer of Non-Volatile memory (NVRAM). Our approach takes a set of volatile main memories residing in independent workstations and transforms it into a fault-tolerant memory—much like RAIDS do with magnetic disks. This layer of NVRAM allows us to create systems that combine the reliability of synchronous disk accesses with the cost of asynchronous disk accesses. We demonstrate the applicability of our approach by integrating it into existing database systems, and by developing novel systems from the ground up. We use experimental evaluation using well-known characterize the performance of our systems. Our experiments suggest that our approach may improve performance by as much as two orders of magnitude.

Journal ArticleDOI
Gagan Agrawal1
TL;DR: This paper presents a general interprocedural technique for performing communication optimizations across procedure boundaries, and uses the result of local analysis to model the communication as a communication loop, and then performs flow-sensitive interProcedural data-flow analysis to avoid redundant communication, and to perform communication aggregation.
Abstract: Because of the increasing computational power of workstations and the PCs, the peak processing power of clusters of workstations has been increasing at a rapid pace. However, the sustained performance on a variety of applications lags far behind, because these systems offer lower communication performance. In this paper, we focus on improving the communication performance of the applications run on the clusters through aggressive compiler optimizations. We present a general interprocedural technique for performing communication optimizations across procedure boundaries. Our technique uses the result of local analysis to model the communication as a communication loop, and then performs flow-sensitive interprocedural data-flow analysis to avoid redundant communication, and to perform communication aggregation. Our experimental results and the projected analysis on the clusters shows that aggressive communication optimizations from compilers are very important for system with low communication performance and high computational power.

Journal ArticleDOI
TL;DR: A workbench developed to provide a framework to perform the distributed processing of Landsat images using a cluster of NT workstations based on the NT implementation (WinMPICH) of the MPI standard.
Abstract: One of the constraints in remotely-sensed images processing is the computational resources required. Distributed processing is applied in remote sensing in order to reduce spatial or temporal cost using the message passing paradigm. In this paper, we present a workbench developed to provide a framework to perform the distributed processing of Landsat images using a cluster of NT workstations. A set of functions and a message structure, called DIPORSI, have been designed to distribute the main algorithms employed in remote sensing. Thus, the large amount of time required by the sequential process drops when the distributed algorithm is used. Our application is based on the NT implementation (WinMPICH) of the MPI standard.

Journal ArticleDOI
TL;DR: A heuristic algorithm based on dynamic critical path (DCP) to solve the problem of task scheduling on networks of workstations (NOW) and shows performance much superior to previously known techniques.
Abstract: In this paper, we propose a heuristic algorithm based on dynamic critical path (DCP) to solve the problem of task scheduling on networks of workstations (NOW). The algorithm takes into account both the characteristics of DCP and NOW by intelligently pre-allocating network communication resources so as to avoid potential communication conflict. It has a reasonable computational complexity of O(v^2(1+p)) , where v is the number of nodes in the task graph and p is the number of processors in the NOW. It is suitable for regular as well as irregular task graphs. The algorithm when tested under various task graphs shows performance much superior to previously known techniques.

Journal ArticleDOI
TL;DR: In this paper, the authors describe a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) shared-memory multiprocessors by overlapping the communication time resulting form non-local memory accesses with the computation time in parallel loops to effectively hide the latency of the remote accesses.
Abstract: This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) shared-memory multiprocessors. The transformation overlaps the communication time resulting form non-local memory accesses with the computation time in parallel loops to effectively hide the latency of the remote accesses. The transformation peels from a parallel loop iterations that access remote data and re-schedules them after the execution of iterations that access only local data (local-only iterations). Asynchronous prefetching of remote data is used to overlap non-local access latency with the execution of local-only iterations. Experimental evaluation of the transformation on a NOW multiprocessor indicates that it is generally effective in improving parallel execution time (up to 1.9 times). The extent of the benefit is determined by three factors: The extent of the benefit is determined by three factors: the size of local-only computations, the significance of remote memory access latency, and the position of the iterations that access remote data in a parallel loop.