Showing papers by "Francesco Quaglia published in 2018"

PDF

Open Access

Proceedings Article•DOI•

The Ultimate Share-Everything PDES System

[...]

Mauro Ianni¹, Romolo Marotta¹, Davide Cingolani¹, Alessandro Pellegrini¹, Francesco Quaglia² - Show less +1 more•Institutions (2)

Sapienza University of Rome¹, University of Rome Tor Vergata²

14 May 2018

TL;DR: This article presents an innovative share-everything PDES system that provides fully non-blocking coordination of the threads when accessing shared data structures and fully speculative processing capabilities---Time Warp style processing--- of the events.

...read moreread less

Abstract: The share-everything PDES (Parallel Discrete Event Simulation) paradigm is based on fully sharing the possibility to process any individual event across concurrent threads, rather than binding Logical Processes (LPs) and their events to threads. It allows concentrating, at any time, the computing power---the CPU-cores on board of a shared-memory machine---towards the unprocessed events that stand closest to the current commit horizon of the simulation run. This fruitfully biases the delivery of the computing power towards the hot portion of the model execution trajectory. In this article we present an innovative share-everything PDES system that provides (1) fully non-blocking coordination of the threads when accessing shared data structures and (2) fully speculative processing capabilities---Time Warp style processing---of the events. As we show via an experimental study, our proposal can cope with hard workloads where both classical Time Warp systems---based on LPs to threads binding---and previous share-everything proposals---not able to exploit fully speculative processing of the events---tend to fail in delivering adequate performance.

...read moreread less

20 citations

Posted Content•

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

[...]

Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, Andrea Scarselli, Francesco Quaglia - Show less +1 more

10 Apr 2018-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This article presents a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata, which is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks.

...read moreread less

Abstract: Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is clearly not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators-the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. Conflict detection relies on conventional atomic machine instructions in the Read-Modify-Write (RMW) class. Furthermore, beyond improving scalability and performance, it can also avoid wasting clock cycles for spin-lock operations by threads that could in principle carry out their memory allocation/release in full concurrency. Thus, it is resilient to performance degradation---in face of concurrent accesses---independently of the current level of fragmentation of the handled memory blocks.

...read moreread less

5 citations

Proceedings Article•DOI•

Adaptive Performance Optimization under Power Constraint in Multi-thread Applications with Diverse Scalability

[...]

Stefano Conoci¹, Pierangelo Di Sanzo¹, Bruno Ciciani¹, Francesco Quaglia²•Institutions (2)

Sapienza University of Rome¹, University of Rome Tor Vergata²

30 Mar 2018

TL;DR: In this article, the authors consider the problem of maximizing the performance of multi-threaded applications under a power cap by dynamically tuning the thread-level parallelism and the power state of CPU-cores in combination.

...read moreread less

Abstract: Energy consumption has become a core concern in computing systems. In this context, power capping is an approach that aims at ensuring that the power consumption of a system does not overcome a predefined threshold. Although various power capping techniques exist in the literature, they do not fit well the nature of multi-threaded workloads with shared data accesses and non-minimal thread-level concurrency. For these workloads, scalability may be limited by thread contention on hardware resources and/or data, to the point that performance may even decrease while increasing the thread-level parallelism, indicating scarce ability to exploit the actual computing power available in highly parallel hardware. In this paper, we consider the problem of maximizing the performance of multi-thread applications under a power cap by dynamically tuning the thread-level parallelism and the power state of CPU-cores in combination. Based on experimental observations, we design a technique that adaptively identifies, in linear time within a bi-dimensional space, the optimal parallelism and power state setting. We evaluated the proposed technique with different benchmark applications, and using different methods for synchronizing threads when accessing shared data, and we compared it with other state-of-the-art power capping techniques.

...read moreread less

4 citations

Proceedings Article•DOI•

A Power Cap Oriented Time Warp Architecture

[...]

Stefano Conoci¹, Davide Cingolani¹, Pierangelo Di Sanzo¹, Bruno Ciciani¹, Alessandro Pellegrini¹, Francesco Quaglia¹ - Show less +2 more•Institutions (1)

Sapienza University of Rome¹

14 May 2018

TL;DR: An innovative Time Warp architecture oriented to efficiently run parallel simulations under a power cap is presented, which considers power usage as a foundational design principle, as opposed to classical power-unaware Time Warp design.

...read moreread less

Abstract: Controlling power usage has become a core objective in modern computing platforms. In this article we present an innovative Time Warp architecture oriented to efficiently run parallel simulations under a power cap. Our architectural organization considers power usage as a foundational design principle, as opposed to classical power-unaware Time Warp design. We provide early experimental results showing the potential of our proposal.

...read moreread less

3 citations

Proceedings Article•DOI•

Porting Event &Cross-State Synchronization to the Cloud

[...]

Matteo Principe¹, Tommaso Tocci², Alessandro Pellegrini¹, Francesco Quaglia³•Institutions (3)

Sapienza University of Rome¹, Barcelona Supercomputing Center², University of Rome Tor Vergata³

14 May 2018

TL;DR: The design of a middleware layer that allows ECS to be ported to distributed-memory clusters of machines and retain the possibility to rely on the enriched ECS programming model while still enabling deployments of PDES models on convenient (Cloud-based) infrastructures is presented.

...read moreread less

Abstract: Along the years, Parallel Discrete Event Simulation (PDES) has been enriched with programming facilities to bypass state disjointness across the concurrent Logical Processes (LPs). New supports have been proposed, offering the programmer approaches alternative to message passing to code complex LPs' relations. Along this path we find Event &Cross-State (ECS), which allows writing event handlers which can perform in-place accesses to the state of any LP, by simply relying on pointers. This programming model has been shipped with a runtime support enabling concurrent speculative execution of LPs limited to shared-memory machines. In this paper, we present the design of a middleware layer that allows ECS to be ported to distributed-memory clusters of machines. A core application of our middleware is to let ECS-coded models be hosted on top of (low-cost) resources from the Cloud. Overall, ECS-coded models no longer demand for powerful shared-memory machines to execute in reasonable time. Thanks to our solution, we retain indeed the possibility to rely on the enriched ECS programming model while still enabling deployments of PDES models on convenient (Cloud-based) infrastructures. An experimental assessment of our proposal is also provided.

...read moreread less

2 citations

Proceedings Article•DOI•

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

[...]

Romolo Marotta¹, Mauro Ianni¹, Andrea Scarselli¹, Alessandro Pellegrini¹, Francesco Quaglia¹ - Show less +1 more•Institutions (1)

Sapienza University of Rome¹

01 Sep 2018

TL;DR: In this paper, the authors present a fully non-blocking buddy-system that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata.

...read moreread less

Abstract: Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale with large thread counts, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators—the bottom most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy-system, that allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling its metadata. Beyond improving scalability and performance it is resilient to performance degradation in face of concurrent accesses independently of the current level of fragmentation of the handled memory blocks.

...read moreread less

2 citations

Proceedings Article•DOI•

Optimizing simulation on shared-memory platforms: the smart cities case

[...]

Mauro Ianni¹, Romolo Marotta¹, Davide Cingolani¹, Alessandro Pellegrini¹, Francesco Quaglia² - Show less +1 more•Institutions (2)

Sapienza University of Rome¹, University of Rome Tor Vergata²

09 Dec 2018

TL;DR: This assessment illustrates the effects of the various tuning parameters related to the Share-Everything paradigm when the simulation models have a variable granularity, opening to a higher understanding of this innovative paradigm.

...read moreread less

Abstract: Modern advancements in computing architectures have been accompanied by new emergent paradigms to run Parallel Discrete Event Simulation models efficiently. Indeed, many new paradigms to effectively use the available underlying hardware have been proposed in the literature. Among these, the Share-Everything paradigm tackles massively-parallel shared-memory machines, in order to support speculative simulation by taking into account the limits and benefits related to this family of architectures. Previous results have shown how this paradigm outperforms traditional speculative strategies (such as data-separated Time Warp systems) whenever the granularity of executed events is small. In this paper, we show performance implications of this simulation-engine organization when the simulation models have a variable granularity. To this end, we have selected a traffic model, tailored for smart cities-oriented simulation. Our assessment illustrates the effects of the various tuning parameters related to the approach, opening to a higher understanding of this innovative paradigm.

...read moreread less

Proceedings Article•DOI•

Model-Based Proactive Read-Validation in Transaction Processing Systems

[...]

Simone Economo¹, Emiliano Silvestri¹, Pierangelo Di Sanzo¹, Alessandro Pellegrin Sapienza¹, Francesco Quaglia¹ - Show less +1 more•Institutions (1)

Sapienza University of Rome¹

01 Dec 2018

TL;DR: An analytical model is presented that predicts the abort probability of transactions handled via read-validation schemes, which may lead to early aborting doomed transactions, thus saving CPU time and improving performance.

...read moreread less

Abstract: Concurrency control protocols based on read-validation schemes allow transactions which are doomed to abort to still run until a subsequent validation check reveals them as invalid. These late aborts do not favor the reduction of wasted computation and can penalize performance. To counteract this problem, we present an analytical model that predicts the abort probability of transactions handled via read-validation schemes. Our goal is to determine what are the suited points-along a transaction lifetime-to carry out a validation check. This may lead to early aborting doomed transactions, thus saving CPU time. We show how to exploit the abort probability predictions returned by the model in combination with a threshold-based scheme to trigger read-validations. We also show how this approach can definitely improve performance-leading up to 14 % better turnaround-as demonstrated by some experiments carried out with a port of the TPC-C benchmark to Software Transactional Memory.

...read moreread less