scispace - formally typeset
Search or ask a question

Showing papers by "Francesco Quaglia published in 2003"


Journal ArticleDOI
TL;DR: It is shown that, except for the case of minimal state granularity applications, nonblocking checkpointing allows improvement of the speed of the parallel execution, as compared to commonly adopted, optimized checkpointing methods based on the classical blocking mode.
Abstract: Describes a nonblocking checkpointing mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific operations (e.g, event list update, event execution) with the aim of removing the cost of recording state information from the completion time of the parallel simulation application. We present an implementation of a C library supporting nonblocking checkpointing on a myrinet based cluster, which demonstrates the practical viability of this checkpointing mode on standard off-the-shelf hardware. By the results of an empirical study on classical parameterized synthetic benchmarks, we show that, except for the case of minimal state granularity applications, nonblocking checkpointing allows improvement of the speed of the parallel execution, as compared to commonly adopted, optimized checkpointing methods based on the classical blocking mode. A performance study for the case of a personal communication system (PCS) simulation is additionally reported to point out the benefits from nonblocking checkpointing for a real world application.

59 citations


Proceedings ArticleDOI
23 Jun 2003
TL;DR: A cost model for non-blocking checkpointing is presented and a performance effective re-synchronization semantic which is called minimum cost re- Synchronization MC is derived, in terms of increase in the execution speed for a Personal Communication System (PCS) simulation application.
Abstract: Checkpointing and Communication Library (CCL) is a recently developed software implementing CPU offloaded checkpointing functionalities in support of optimistic parallel simulation on myrinet clusters. Specifically, CCL implements a non-blocking execution mode of memory-to-memory data copy associated with checkpoint operations, based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. Re-synchronization between CPU and DMA activities must sometimes be employed for several reasons, such as maintenance of data consistency, thus adding some overhead to (otherwise CPU cost-free) non-blocking checkpoint operations. In this paper we present a cost model for non-blocking checkpointing and derive a performance effective re-synchronization semantic which we call minimum cost re-synchronization MC. With this semantic, an occurrence of re-synchronization either commits an on-going DMA based checkpoint operation (causing suspension of CPU activities) or aborts the operation (with possible increase in the expected rollback cost due to a reduced amount of committed checkpoints) on the basis of a minimum overhead expectation evaluated through the cost model. We have implemented MC within CCL, and we also report experimental results demonstrating the performance benefits from this optimized re-synchronization semantic, in terms of increase in the execution speed, for a Personal Communication System (PCS) simulation application.

5 citations


Proceedings ArticleDOI
10 Jun 2003
TL;DR: This work presents CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semiasynchronous checkpoints, and reports the results of the evaluation of those benefits for the case of a personal communication system simulation application.
Abstract: CCL (Checkpointing and Communication Library) is arecently developed software in support of optimistic parallelsimulation on myrinet based clusters. Beyond classicallow latency message delivery functionalities, this libraryimplements CPU offloaded, semi-asynchronous checkpointingfunctionalities based on data transfer capabilities providedby a programmable DMA engine on board of myrinetnetwork cards. The latest version of CCL (v2.4), designedfor M2M-PCI32C myrinet cards, only supports monoprogrammedsemi-asynchronous checkpoints. This forces resynchronizationbetween CPU and DMA activities each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present CCL v3.0 that, exploiting hardware features of more advanced M3M-PCI64C myrinet cards, supports multiprogrammed semi-asynchronous checkpoints. The multiprogrammed approach allows higher degree of concurrencybetween checkpointing and other simulation specific operations carried out by the CPU, with obvious benefits onperformance. We also report the results of the evaluationof those benefits for the case of a personal communicationsystem simulation application.

3 citations