scispace - formally typeset
Proceedings ArticleDOI

POSTER: Fault-tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support

Reads0
Chats0
TLDR
This work proposes a software/hardware hybrid approach, which leverages Intel's hardware transactional memory (TSX) to support implicit checkpoint creation and fast rollback, leading to a resulting performance overhead of 19% on average.
Abstract
Software-based fault-tolerance mechanisms can increase the reliability of multi-core CPUs while being cheaper and more flexible than hardware solutions like lockstep architectures. However, checkpoint creation, error detection and correction entail high performance overhead if implemented in software. We propose a software/hardware hybrid approach, which leverages Intel's hardware transactional memory (TSX) to support implicit checkpoint creation and fast rollback. Hardware enhancements are proposed and evaluated, leading to a resulting performance overhead of 19% on average.

read more

Citations
More filters
Book ChapterDOI

Fault-Tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support

TL;DR: This paper proposes a software/hardware hybrid approach, which targets Intel’s current x86 multi-core platforms of the Core and Xeon family, and leverage hardware transactional memory (Intel TSX) to support implicit checkpoint creation and fast rollback.
References
More filters
Proceedings ArticleDOI

Transactional memory: architectural support for lock-free data structures

TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Journal ArticleDOI

SPEC CPU2006 benchmark descriptions

TL;DR: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006, which replaces CPU2000, and the SPEC CPU benchmarks are widely used in both industry and academia.
Proceedings ArticleDOI

SWIFT: Software Implemented Fault Tolerance

TL;DR: A novel, software-only, transient-fault-detection technique, called SWIFT, which efficiently manages redundancy by reclaiming unused instruction-level resources present during the execution of most programs and provides a high level of protection and performance with an enhanced control-flow checking mechanism.
Proceedings ArticleDOI

Transient fault detection via simultaneous multithreading

TL;DR: The concept of the sphere of replication is introduced, which abstract both the physical redundancy of a lockstepped system and the logical redundancy of an SRT processor, and two mechanisms-slack fetch and branch outcome queue-are proposed and evaluated that enhance the performance of anSRT processor by allowing one thread to prefetch cache misses and branch results for the other thread.
Book

Architecture Design for Soft Errors

TL;DR: This book provides a comprehensive description of the architetural techniques to tackle the soft error problem, and covers the new methodologies for quantitative analysis of soft errors as well as novel, cost-effective architectural techniques to mitigate them.
Related Papers (5)