scispace - formally typeset
Search or ask a question
Author

Sergei Gorlatch

Bio: Sergei Gorlatch is an academic researcher from University of Münster. The author has contributed to research in topics: CUDA & Grid. The author has an hindex of 27, co-authored 235 publications receiving 2692 citations. Previous affiliations of Sergei Gorlatch include University of Passau & Technical University of Berlin.
Topics: CUDA, Grid, Compiler, Scalability, The Internet


Papers
More filters
Book
11 Nov 2002
TL;DR: This chapter discusses Skeleton-based Programming Environments, a Programming Methodology with Skeletons and Collective Operations, and multi-paradigm and Design Pattern Approaches for HW/SW Design and Reuse.
Abstract: 1 Foundations of Data-parallel Skeletons.- 2 SAT: A Programming Methodology with Skeletons and Collective Operations.- 3 Transforming Rapid Prototypes to Efficient Parallel Programs.- 4 Parallelism Abstractions in Eden.- 5 Skeleton Realisations from Functional Prototypes.- 6 Task and Data Parallelism in P3L.- 7 Skeleton-based Programming Environments.- 8 Applying the Quality Connector Pattern.- 9 Service Design Patterns for Computational Grids.- 10 Towards Patterns of Web Services Composition.- 11 Multi-paradigm and Design Pattern Approaches for HW/SW Design and Reuse.

195 citations

Proceedings ArticleDOI
16 May 2011
TL;DR: This work proposes SkelCL -- a library providing so-called algorithmic skeletons that capture recurring patterns of parallel computation and communication, together with an abstract vector data type and constructs for specifying data distribution that greatly simplifies programming GPU systems.
Abstract: While CUDA and OpenCL made general-purpose programming for Graphics Processing Units (GPU) popular, using these programming approaches remains complex and error-prone because they lack high-level abstractions. The especially challenging systems with multiple GPU are not addressed at all by these low-level programming models. We propose SkelCL -- a library providing so-called algorithmic skeletons that capture recurring patterns of parallel computation and communication, together with an abstract vector data type and constructs for specifying data distribution. We demonstrate that SkelCL greatly simplifies programming GPU systems. We report the competitive performance results of SkelCL using both a simple Mandelbrot set computation and an industrial-strength medical imaging application. Because the library is implemented using OpenCL, it is portable across GPU hardware of different vendors.

132 citations

Journal ArticleDOI
TL;DR: It is argued that, like goto in sequential programs, send-receive should be avoided as far as possible and replaced by collective operations in the setting of message passing and presented in the context of MPI (Message Passing Interface).
Abstract: During the software crisis of the 1960s, Dijkstra's famous thesis "goto considered harmful" paved the way for structured programming. This short communication suggests that many current difficulties of parallel programming based on message passing are caused by poorly structured communication, which is a consequence of using low-level send-receive primitives. We argue that, like goto in sequential programs, send-receive should be avoided as far as possible and replaced by collective operations in the setting of message passing. We dispute some widely held opinions about the apparent superiority of pairwise communication over collective communication and present substantial theoretical and empirical evidence to the contrary in the context of MPI (Message Passing Interface).

108 citations

Journal ArticleDOI
01 Oct 2001
TL;DR: An experimental performance evaluation shows that the newly implemented collective operations of MPI's collective operations have significantly improved performance for large messages, and that there is a close match between the theoretical model and the measured completion times.
Abstract: Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programming parallel applications for such platforms is their hierarchical network structure: latency and bandwidth of WANs often are orders of magnitude worse than those of local networks. Our goal is to optimize MPI's collective operations for such platforms. We use two techniques: selecting suitable communication graph shapes, and splitting messages into multiple segments that are sent in parallel over different WAN links. To optimize graph shape and segment size at runtime, we introduce a performance model called Parameterized Log P ( P − Log P ) , a hierarchical extension of the Log P model that covers messages of arbitrary length. An experimental performance evaluation shows that the newly implemented collective operations have significantly improved performance for large messages, and that there is a close match between the theoretical model and the measured completion times.

104 citations

Proceedings ArticleDOI
01 May 2000
TL;DR: A performance model called parameterized LogP (P-LogP), a hierarchical extension of the LogP model that covers messages of arbitrary length is introduced, and the optimal segment size and the best broadcast tree shape can be determined at runtime.
Abstract: Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks A major problem in programming parallel applications for such platforms is their hierarchical network structure: latency and bandwidth of WANs often are orders of magnitude worse than those of local networks Our goal is to optimize MPI's collective operations for such platforms In this paper we focus on optimized utilization of the (scarce) wide-area bandwidth We use two techniques: selecting suitable communication graph shapes, and splitting messages into multiple segments that are sent in parallel over different WAN links To determine the best graph shape and segment size, we introduce a performance model called parameterized LogP (P-LogP), a hierarchical extension of the LogP model that covers messages of arbitrary length With P-LogP, the optimal segment size and the best broadcast tree shape can be determined at runtime (For conciseness, we restrict our discussion to the broadcast operation) An experimental performance evaluation shows that the new broadcast has significantly improved performance (for large messages) and that there is a close match between the theoretical model and the measured completion times

88 citations


Cited by
More filters
Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

Book
01 Jan 2001
TL;DR: This chapter discusses Decision-Theoretic Foundations, Game Theory, Rationality, and Intelligence, and the Decision-Analytic Approach to Games, which aims to clarify the role of rationality in decision-making.
Abstract: Preface 1. Decision-Theoretic Foundations 1.1 Game Theory, Rationality, and Intelligence 1.2 Basic Concepts of Decision Theory 1.3 Axioms 1.4 The Expected-Utility Maximization Theorem 1.5 Equivalent Representations 1.6 Bayesian Conditional-Probability Systems 1.7 Limitations of the Bayesian Model 1.8 Domination 1.9 Proofs of the Domination Theorems Exercises 2. Basic Models 2.1 Games in Extensive Form 2.2 Strategic Form and the Normal Representation 2.3 Equivalence of Strategic-Form Games 2.4 Reduced Normal Representations 2.5 Elimination of Dominated Strategies 2.6 Multiagent Representations 2.7 Common Knowledge 2.8 Bayesian Games 2.9 Modeling Games with Incomplete Information Exercises 3. Equilibria of Strategic-Form Games 3.1 Domination and Ratonalizability 3.2 Nash Equilibrium 3.3 Computing Nash Equilibria 3.4 Significance of Nash Equilibria 3.5 The Focal-Point Effect 3.6 The Decision-Analytic Approach to Games 3.7 Evolution. Resistance. and Risk Dominance 3.8 Two-Person Zero-Sum Games 3.9 Bayesian Equilibria 3.10 Purification of Randomized Strategies in Equilibria 3.11 Auctions 3.12 Proof of Existence of Equilibrium 3.13 Infinite Strategy Sets Exercises 4. Sequential Equilibria of Extensive-Form Games 4.1 Mixed Strategies and Behavioral Strategies 4.2 Equilibria in Behavioral Strategies 4.3 Sequential Rationality at Information States with Positive Probability 4.4 Consistent Beliefs and Sequential Rationality at All Information States 4.5 Computing Sequential Equilibria 4.6 Subgame-Perfect Equilibria 4.7 Games with Perfect Information 4.8 Adding Chance Events with Small Probability 4.9 Forward Induction 4.10 Voting and Binary Agendas 4.11 Technical Proofs Exercises 5. Refinements of Equilibrium in Strategic Form 5.1 Introduction 5.2 Perfect Equilibria 5.3 Existence of Perfect and Sequential Equilibria 5.4 Proper Equilibria 5.5 Persistent Equilibria 5.6 Stable Sets 01 Equilibria 5.7 Generic Properties 5.8 Conclusions Exercises 6. Games with Communication 6.1 Contracts and Correlated Strategies 6.2 Correlated Equilibria 6.3 Bayesian Games with Communication 6.4 Bayesian Collective-Choice Problems and Bayesian Bargaining Problems 6.5 Trading Problems with Linear Utility 6.6 General Participation Constraints for Bayesian Games with Contracts 6.7 Sender-Receiver Games 6.8 Acceptable and Predominant Correlated Equilibria 6.9 Communication in Extensive-Form and Multistage Games Exercises Bibliographic Note 7. Repeated Games 7.1 The Repeated Prisoners Dilemma 7.2 A General Model of Repeated Garnet 7.3 Stationary Equilibria of Repeated Games with Complete State Information and Discounting 7.4 Repeated Games with Standard Information: Examples 7.5 General Feasibility Theorems for Standard Repeated Games 7.6 Finitely Repeated Games and the Role of Initial Doubt 7.7 Imperfect Observability of Moves 7.8 Repeated Wines in Large Decentralized Groups 7.9 Repeated Games with Incomplete Information 7.10 Continuous Time 7.11 Evolutionary Simulation of Repeated Games Exercises 8. Bargaining and Cooperation in Two-Person Games 8.1 Noncooperative Foundations of Cooperative Game Theory 8.2 Two-Person Bargaining Problems and the Nash Bargaining Solution 8.3 Interpersonal Comparisons of Weighted Utility 8.4 Transferable Utility 8.5 Rational Threats 8.6 Other Bargaining Solutions 8.7 An Alternating-Offer Bargaining Game 8.8 An Alternating-Offer Game with Incomplete Information 8.9 A Discrete Alternating-Offer Game 8.10 Renegotiation Exercises 9. Coalitions in Cooperative Games 9.1 Introduction to Coalitional Analysis 9.2 Characteristic Functions with Transferable Utility 9.3 The Core 9.4 The Shapkey Value 9.5 Values with Cooperation Structures 9.6 Other Solution Concepts 9.7 Colational Games with Nontransferable Utility 9.8 Cores without Transferable Utility 9.9 Values without Transferable Utility Exercises Bibliographic Note 10. Cooperation under Uncertainty 10.1 Introduction 10.2 Concepts of Efficiency 10.3 An Example 10.4 Ex Post Inefficiency and Subsequent Oilers 10.5 Computing Incentive-Efficient Mechanisms 10.6 Inscrutability and Durability 10.7 Mechanism Selection by an Informed Principal 10.8 Neutral Bargaining Solutions 10.9 Dynamic Matching Processes with Incomplete Information Exercises Bibliography Index

3,569 citations