scispace - formally typeset
Search or ask a question

Showing papers by "Xuehai Qian published in 2013"


Proceedings ArticleDOI
16 Mar 2013
TL;DR: Volition is presented, the first hardware scheme that detects SCVs in a relaxed-consistency machine precisely, in a scalable manner, and for an arbitrary number of processors in the cycle.
Abstract: Sequential Consistency (SC) is the most intuitive memory model, and SC Violations (SCVs) produce unintuitive, typically incorrect executions. Most prior SCV detection schemes have used data races as proxies for SCVs, which is highly imprecise. Other schemes that have targeted data-race cycles are either too conservative or are designed only for two-processor cycles and snoopy-based systems.This paper presents Volition, the first hardware scheme that detects SCVs in a relaxed-consistency machine precisely, in a scalable manner, and for an arbitrary number of processors in the cycle. Volition leverages cache coherence protocol transactions to dynamically detect cycles in memory-access orders across threads. When a cycle is about to occur, an exception is triggered. Volition can be used in both directory- and snoopy-based coherence protocols. Our simulations of Volition in a 64-processor multicore with directory-based coherence running SPLASH-2 and Parsec programs shows that Volition induces negligible traffic and execution overhead. In addition, it can detect SCVs with several processors. Volition is suitable for on-the-fly use.

26 citations


Proceedings ArticleDOI
23 Feb 2013
TL;DR: Rainbow is proposed, which is based on Strata but records near-precise happens-before relations, reducing the number of logs and increasing the replay parallelism, and is the first R&R scheme that supports any relaxed memory consistency model.
Abstract: Architectures for record-and-replay (R&R) of multithreaded applications ease program debugging, intrusion analysis and fault-tolerance. Among the large body of previous works, Strata enables efficient memory dependence recording with little hardware overhead and can be applied smoothly to snoopy protocols. However, Strata records imprecise happens-before relations and assumes Sequential Consistency (SC) machines that execute memory operations in order. This paper proposes Rainbow, which is based on Strata but records near-precise happens-before relations, reducing the number of logs and increasing the replay parallelism. More importantly, it is the first R&R scheme that supports any relaxed memory consistency model. These improvements are achieved by two key techniques: (1) To compact logs, we propose expandable spectrum (the region between two logs). It allows younger non-conflict memory operations to be moved into older spectrum, increasing the chance of reusing existing logs. (2) To identify the overlapped and incompatible spectra due to reordered memory operations, we propose an SC violation detection mechanism based on the existing logs and the extra information can be recorded to reproduce the violations when they occur. Our simulation results with 10 SPLASH-2 benchmarks show that Rainbow reduces the log size by 26.6% and improves replay speed by 26.8% compared to Strata. The SC violations are few but do exist in the applications evaluated.

8 citations


Proceedings ArticleDOI
07 Dec 2013
TL;DR: A model of chunk commit in a distributed directory protocol is presented, and two general techniques are proposed to attain scalable and fast commit: Serialization of the write sets of output-dependent chunks to avoid squashes and full parallelization of directory module ownership by the committing chunks.
Abstract: To help improve the programmability and performance of shared-memory multiprocessors, there are proposals of architectures that continuously execute atomic blocks of instructions — also called Chunks. To be competitive, these architectures must support chunk operations very efficiently. In particular, in a large manycore with lazy conflict detection, they must support efficient chunk commit. This paper addresses the challenge of providing scalable and fast chunk commit for a large manycore in a lazy environment. To understand the problem, we first present a model of chunk commit in a distributed directory protocol. Then, to attain scalable and fast commit, we propose two general techniques: (1) Serialization of the write sets of output-dependent chunks to avoid squashes and (2) Full parallelization of directory module ownership by the committing chunks. Our simulation results with 64-threaded codes show that our combined scheme, called BulkCommit, eliminates most of the squash and commit stall times, speeding-up the codes by an av­erage of 40% and 18% compared to previously-proposed schemes.

6 citations