scispace - formally typeset
Search or ask a question

Showing papers by "Francesco Quaglia published in 2014"


Journal ArticleDOI
TL;DR: The accuracy and feasibility of TAS’s performance forecasting methodology is demonstrated via an extensive experimental study based on a fully fledged prototype implementation integrated with a popular open-source in-memory transactional data grid and industry-standard benchmarks generating a breadth of heterogeneous workloads.
Abstract: In this article, we introduce TAS (Transactional Auto Scaler), a system for automating the elastic scaling of replicated in-memory transactional data grids, such as NoSQL data stores or Distributed Transactional Memories. Applications of TAS range from online self-optimization of in-production applications to the automatic generation of QoS/cost-driven elastic scaling policies, as well as to support for what-if analysis on the scalability of transactional applications.In this article, we present the key innovation at the core of TAS, namely, a novel performance forecasting methodology that relies on the joint usage of analytical modeling and machine learning. By exploiting these two classically competing approaches in a synergic fashion, TAS achieves the best of the two worlds, namely, high extrapolation power and good accuracy, even when faced with complex workloads deployed over public cloud infrastructures.We demonstrate the accuracy and feasibility of TAS’s performance forecasting methodology via an extensive experimental study based on a fully fledged prototype implementation integrated with a popular open-source in-memory transactional data grid (Red Hat’s Infinispan) and industry-standard benchmarks generating a breadth of heterogeneous workloads.

27 citations


Proceedings ArticleDOI
26 May 2014
TL;DR: A combination of analytical and Machine Learning techniques are exploited in order to build a performance model allowing to dynamically tune the level of concurrency of applications based on Software Transactional Memory (STM).
Abstract: In this article we exploit a combination of analytical and Machine Learning (ML) techniques in order to build a performance model allowing to dynamically tune the level of concurrency of applications based on Software Transactional Memory (STM). Our mixed approach has the advantage of reducing the training time of pure machine learning methods, and avoiding approximation errors typically affecting pure analytical approaches. Hence it allows very fast construction of highly reliable performance models, which can be promptly and effectively exploited for optimizing actual application runs. We also present a real implementation of a concurrency regulation architecture, based on the mixed modeling approach, which has been integrated with the open source TinySTM package, together with experimental data related to runs of applications taken from the STAMP benchmark suite demonstrating the effectiveness of our proposal.

24 citations


Journal ArticleDOI
TL;DR: The Speculative Transactional Replication (STR) problem is formalized by means of a set of properties ensuring that transactions are never activated on inconsistent snapshots, as well as the minimality and completeness of the set of speculatively explored serialization orders.

22 citations


Proceedings ArticleDOI
18 May 2014
TL;DR: This article tackles transparent parallelization of Discrete Event Simulation (DES) models to be run on top of multi-core machines according to speculative schemes by introducing an advanced memory management architecture, able to efficiently detect read/write accesses by concurrent objects to whichever object state in an application transparent manner.
Abstract: In this article we tackle transparent parallelization of Discrete Event Simulation (DES) models to be run on top of multi-core machines according to speculative schemes. The innovation in our proposal lies in that we consider a more general programming and execution model, compared to the one targeted by state of the art PDES platforms, where the boundaries of the state portion accessible while processing an event at a specific simulation object do not limit access to the actual object state, or to shared global variables. Rather, the simulation object is allowed to access (and alter) the state of any other object, thus causing what we term cross-state dependency. We note that this model exactly complies with typical (easy to manage) sequential-style DES programming, where a (dynamically-allocated) state portion of object A can be accessed by object B in either read or write mode (or both) by, e.g., passing a pointer to B as the payload of a scheduled simulation event. However, while read/write memory accesses performed in the sequential run are always guaranteed to observe (and to give rise to) a consistent snapshot of the state of the simulation model, consistency is not automatically guaranteed in case of parallelization and concurrent execution of simulation objects with cross-state dependencies. We cope with such a consistency issue, and its application-transparent support, in the context of parallel and optimistic executions. This is achieved by introducing an advanced memory management architecture, able to efficiently detect read/write accesses by concurrent objects to whichever object state in an application transparent manner, together with advanced synchronization mechanisms providing the advantage of exploiting parallelism in the underlying multi-core architecture while transparently handling both cross-state and traditional event-based dependencies. Our proposal targets Linux and has been integrated with the ROOT-Sim open source optimistic simulation platform, although its design principles, and most parts of the developed software, are of general relevance.

17 citations


Proceedings ArticleDOI
22 Oct 2014
TL;DR: This article presents a waitfree shared memory GVT algorithm that requires no critical section and correct coordination across the processes while computing the GVT value is achieved via memory atomic operations, namely compare-and-swap.
Abstract: Global Virtual Time (GVT) is a powerful abstraction used to discriminate what events belong (and what do not belong) to the past history of a parallel/distributed computation. For high performance simulation systems based on the Time Warp synchronization protocol, where concurrent simulation objects are allowed to process their events speculatively and causal consistency is achieved via rollback/recovery techniques, GVT is used to determine which portion of the simulation can be considered as committed. Hence it is the base for actuating memory recovery (e.g. of obsolete logs that were taken in order to support state recoverability) and nonrevocable operations (e.g. I/O). For shared memory implementations of simulation platforms based on the Time Warp protocol, the reference GVT algorithm is the one presented by Fujimoto and Hybinette [1]. However, this algorithm relies on critical sections that make it non-wait-free, and which can hamper scalability. In this article we present a waitfree shared memory GVT algorithm that requires no critical section. Rather, correct coordination across the processes while computing the GVT value is achieved via memory atomic operations, namely compare-and-swap. The price paid by our proposal is an increase in the number of GVT computation phases, as opposed to the single phase required by the proposal in [1]. However, as we show via the results of an experimental study, the wait-free nature of the phases carried out in our GVT algorithm pays-off in reducing the actual cost incurred by the proposal in [1].

16 citations


Book ChapterDOI
25 Aug 2014
TL;DR: This study identifies several issues associated with the employment of techniques originally conceived for STM, and proposes an innovative machine learning based technique explicitly designed to take into account peculiarities of HTM systems, and demonstrates its advantages in terms of higher accuracy and shorter learning times using the STAMP benchmark suite.
Abstract: Transactional Memory (TM) is an emerging paradigm that promises to ease the development of parallel applications. Due to its inherently speculative nature, however, TM can suffer of performance degradations in presence of conflict intensive workloads.A key technique to tackle this issue consists in dynamically regulating the number of concurrent threads, which allows for selecting the concurrency level that best fits the intrinsic parallelism of specific applications. In this area, several self-tuning approaches have been proposed for Software-based implementations of TM (STM). In this paper we investigate the effectiveness of these techniques when applied to Hardware TM (HTM), a theme that is particularly relevant and timely given the recent integration of hardware supports for TM in next generation of mainstream Intel processors. Our study, conducted on Intel’s implementation of HTM, identifies several issues associated with the employment of techniques originally conceived for STM. Motivated by these findings, we propose an innovative machine learning based technique explicitly designed to take into account peculiarities of HTM systems, and demonstrate its advantages, in terms of higher accuracy and shorter learning times, using the STAMP benchmark suite.

12 citations


Journal ArticleDOI
TL;DR: This paper presents a parallel invocation protocol, which exploits the path-diversity along the end-to-end interaction toward the origin sites by concurrently routing transactional requests toward multiple-edge servers.
Abstract: Edge computing is a powerful tool to face the challenging performance requirements of modern Internet applications. By replicating applications' data and logic across a large number of geographically distributed servers, edge computing platforms allow to achieve significant enhancements of the proximity between clients and contents, and of the system scalability. These platforms reveal highly effective when handling requests entailing read-only access to the application data, as these requests can be autonomously served by some edge server typically located closer to the client than the origin site. However, in contexts where end users can trigger transactional manipulations of the application state (e.g., e-Commerce, auctions or financial applications), the corresponding update requests typically need to be redirected to the origin transactional data sources, thus, nullifying any performance benefit arising from data replication and client proximity. To cope with this issue, in this paper, we present a parallel invocation protocol, which exploits the path-diversity along the end-to-end interaction toward the origin sites by concurrently routing transactional requests toward multiple-edge servers. Request processing is finally carried out by a single-edge server, adaptively selected as the most responsive one depending on current system conditions. The proposed edge server selection scheme does not require coordination among (geographically distributed) edge server instances, thus, being very light and scalable. The benefits from our protocol in terms of both reduced and more predictable end-to-end latency are quantified via an extended simulation study.

11 citations


Proceedings ArticleDOI
12 Feb 2014
TL;DR: This paper presents a solution that dynamically shrinks or enlarges the set of input features to be exploited by the machine-learner, which allows for tuning the concurrency level while also minimizing the overhead for input-features sampling.
Abstract: In this paper we explore machine-learning approaches for dynamically selecting the well suited amount of concurrent threads in applications relying on Software Transactional Memory (STM). Specifically, we present a solution that dynamically shrinks or enlarges the set of input features to be exploited by the machine-learner. This allows for tuning the concurrency level while also minimizing the overhead for input-features sampling, given that the cardinality of the input-feature set is always tuned to the minimum value that still guarantees reliability of workload characterization. We also present a fully heedged implementation of our proposal within the TinySTM open source framework, and provide the results of an experimental study relying on the STAMP benchmark suite, which show significant reduction of the response time with respect to proposals based on static feature selection.

10 citations


Proceedings Article
01 Jan 2014
TL;DR: In this paper, the authors identify the strongest consistency and liveness guarantees that a DAP TM can ensure while maximizing efficiency in read-dominated workloads, and show that these guarantees can be used to break the wall of existing impossibility results on DAP TMs.
Abstract: Transactional Memory (TM) implementations guaranteeing disjoint-access parallelism (DAP) are desirable on multi-core architectures because they can exploit low-level parallelism. In this paper we look for a breach in the wall of existing impossibility results on DAP TMs, by identifying the strongest consistency and liveness guarantees that a DAP TM can ensure while maximizing efficiency in read-dominated workloads. Along the path of designing this protocol, we report two impossibility results related to ensuring real-time order in a DAP TM.

6 citations


Book ChapterDOI
25 Aug 2014
TL;DR: Both programmability and performance aspects related to developing/supporting a multi-agent exploration model on top of the ROOT-Sim PDES platform, which supports ECS are investigated.
Abstract: While the traditional objective of parallel/distributed simulation techniques has been mainly in improving performance and making very large models tractable, more recent research trends targeted complementary aspects, such as the "ease of programming". Along this line, a recent proposal called Event and Cross State ECS synchronization, stands as a solution allowing to break the traditional programming rules proper of Parallel Discrete Event Simulation PDES systems, where the application code processing a specific event is only allowed to access the state namely the memory image of the target simulation object. In fact with ECS, the programmer is allowed to write ANSI-C event-handlers capable of accessing in either read or write mode the state of whichever simulation object included in the simulation model. Correct concurrent execution of events, e.g., on top of multi-core machines, is guaranteed by ECS with no intervention by the programmer, who is in practice exposed to a sequential-style programming model where events are processed one at a time, and have the ability to access the current memory image of the whole simulation model, namely the collection of the states of any involved object. This can strongly simplify the development of specific models, e.g., by avoiding the need for passing state information across concurrent objects in the form of events. In this article we investigate on both programmability and performance aspects related to developing/supporting a multi-agent exploration model on top of the ROOT-Sim PDES platform, which supports ECS.

4 citations


Proceedings Article
01 Jan 2014
TL;DR: This paper looks for a breach in the wall of existing impossibility results, by attempting to identify the strongest consistency and liveness guarantees that a TM can ensure while remaining scalable — by ensuring DAP — and maximizing efficiency in read-dominated workloads — by having invisible and wait-free readonly transactions.
Abstract: Transactional Memory (TM) is a powerful abstraction for synchronizing activities of different threads through transactions. TM implementations guaranteeing Disjoint-Access Parallelism (DAP) are highly desirable on current multi-core architectures because they can exploit low-level parallelism. Unfortunately, a number of results have been proved concerning the impossibility of implementing TMs that guarantee different variants of the DAP property, as well as alternative consistency and liveness criteria. This paper looks for a breach in the wall of existing impossibility results, by attempting to identify the strongest consistency and liveness guarantees that a TM can ensure while remaining scalable — by ensuring DAP — and maximizing efficiency in read-dominated workloads — by having invisible and wait-free readonly transactions. We show that implementing such a TM is indeed possible if one adopts as consistency criterion Extended Update Serializability, combined with a weaker variant of real-time order, which we name Witnessable Real Time Order. Interestingly the resulting semantics share a number of similarities with classic TM safety criteria like Opacity and Virtual World Consistency, while allowing for scalable and efficient implementations. Along the path of designing this protocol, we report two impossibility results related to ensuring realtime order in a weakly DAP TM that guarantees wait-free read-only transactions considering different progress criteria and both visible and invisible read-only transactions. Finally, we also provide a lower bound on the space complexity of a strictly DAP TM that ensures a very weak consistency criterion, called Consistent View. We leverage this result to prove that the proposed protocol is optimal in terms of per object version spatial utilization.

Posted Content
TL;DR: A flexible simulation framework offering skeleton simulation models that can be easily specialized in order to capture the dynamics of diverse data grid systems, such as those related to the specific (distributed) protocol used to provide data consistency and/or transactional guarantees is presented.
Abstract: In-memory (transactional) data stores are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, defining the well-suited amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, in order to optimize reliability/availability and performance tradeoffs, is far from being a trivial task. Yet, it is an essential aspect of the provisioning process of cloud platforms, given that it has an impact on how well cloud resources are actually exploited. To cope with the issue of determining optimized configurations of cloud in-memory data stores, in this article we present a flexible simulation framework offering skeleton simulation models that can be easily specialized in order to capture the dynamics of diverse data grid systems, such as those related to the specific protocol used to provide data consistency and/or transactional guarantees. Besides its flexibility, another peculiar aspect of the framework lies in that it integrates simulation and machine-learning (black-box) techniques, the latter being essentially used to capture the dynamics of the data-exchange layer (e.g. the message passing layer) across the cache servers. This is a relevant aspect when considering that the actual data-transport/networking infrastructure on top of which the data grid is deployed might be unknown, hence being not feasible to be modeled via white-box (namely purely simulative) approaches. We also provide an extended experimental study aimed at validating instances of simulation models supported by our framework against execution dynamics of real data grid systems deployed on top of either private or public cloud infrastructures.