Showing papers by "Milo M. K. Martin published in 2012"

PDF

Open Access

Journal Article•DOI•

Why on-chip cache coherence is here to stay

[...]

Milo M. K. Martin¹, Mark D. Hill², Daniel J. Sorin³•Institutions (3)

University of Pennsylvania¹, University of Wisconsin-Madison², Duke University³

01 Jul 2012-Communications of The ACM

TL;DR: On-chip hardware coherence can scale gracefully as the number of cores increases, and the value of these cores can increase with increasing number of processors.

...read moreread less

Abstract: Today’s multicore chips commonly implement shared memory with cache coherence as low-level support for operating systems and application software. Technology trends continue to enable the scaling of the number of (processor) cores per chip. Because conventional wisdom says that the coherence does not scale well to many cores, some prognosticators predict the end of coherence. This paper seeks to refute this conventional wisdom by showing one way to scale on-chip cache coherence with bounded, modest costs by combining known techniques such as: shared caches augmented to track cached copies, explicit cache eviction notifications, and hierarchical design. Based on this scalable proof-of-concept design, we predict that on-chip coherence and the programming convenience and compatibility it provides are here to stay.

...read moreread less

298 citations

Journal Article•DOI•

Formalizing the LLVM intermediate representation for verified program transformations

[...]

Jianzhou Zhao¹, Santosh Nagarakatte¹, Milo M. K. Martin¹, Steve Zdancewic¹•Institutions (1)

University of Pennsylvania¹

25 Jan 2012

TL;DR: Vellvm provides a mechanized formal semantics of LLVM's intermediate representation, its type system, and properties of its SSA form, which includes multiple operational semantics and proves relations among them to facilitate different reasoning styles and proof techniques.

...read moreread less

Abstract: This paper presents Vellvm (verified LLVM), a framework for reasoning about programs expressed in LLVM's intermediate representation and transformations that operate on it. Vellvm provides a mechanized formal semantics of LLVM's intermediate representation, its type system, and properties of its SSA form. The framework is built using the Coq interactive theorem prover. It includes multiple operational semantics and proves relations among them to facilitate different reasoning styles and proof techniques.To validate Vellvm's design, we extract an interpreter from the Coq formal semantics that can execute programs from LLVM test suite and thus be compared against LLVM reference implementations. To demonstrate Vellvm's practicality, we formalize and verify a previously proposed transformation that hardens C programs against spatial memory safety violations. Vellvm's tools allow us to extract a new, verified implementation of the transformation pass that plugs into the real LLVM infrastructure; its performance is competitive with the non-verified, ad-hoc original.

...read moreread less

193 citations

Proceedings Article•DOI•

Computational sprinting

[...]

Arun Raghavan¹, Yixin Luo², Anuj Chandawalla², Marios C. Papaefthymiou², Kevin P. Pipe², Thomas F. Wenisch², Milo M. K. Martin¹ - Show less +3 more•Institutions (2)

University of Pennsylvania¹, University of Michigan²

25 Feb 2012

TL;DR: A thermal design that incorporates phase-change materials to provide thermal capacitance to enable parallel sprinting has the potential to achieve the task response time of a 16W chip within the thermal constraints of a 1W mobile platform.

...read moreread less

Abstract: Although transistor density continues to increase, voltage scaling has stalled and thus power density is increasing each technology generation. Particularly in mobile devices, which have limited cooling options, these trends lead to a utilization wall in which sustained chip performance is limited primarily by power rather than area. However, many mobile applications do not demand sustained performance; rather they comprise short bursts of computation in response to sporadic user activity. To improve responsiveness for such applications, this paper explores activating otherwise powered-down cores for sub-second bursts of intense parallel computation. The approach exploits the concept of computational sprinting, in which a chip temporarily exceeds its sustainable thermal power budget to provide instantaneous throughput, after which the chip must return to nominal operation to cool down. To demonstrate the feasibility of this approach, we analyze the thermal and electrical characteristics of a smart-phone-like system that nominally operates a single core (~1W peak), but can sprint with up to 16 cores for hundreds of milliseconds. We describe a thermal design that incorporates phase-change materials to provide thermal capacitance to enable such sprints. We analyze image recognition kernels to show that parallel sprinting has the potential to achieve the task response time of a 16W chip within the thermal constraints of a 1W mobile platform.

...read moreread less

190 citations

Book Chapter•DOI•

An axiomatic memory model for POWER multiprocessors

[...]

Sela Mador-Haim¹, Luc Maranget², Susmit Sarkar³, Kayvan Memarian³, Jade Alglave⁴, Scott Owens³, Rajeev Alur¹, Milo M. K. Martin¹, Peter Sewell³, Derek Edward Williams⁵ - Show less +6 more•Institutions (5)

University of Pennsylvania¹, French Institute for Research in Computer Science and Automation², University of Cambridge³, University of Oxford⁴, IBM⁵

07 Jul 2012

TL;DR: This paper establishes the equivalence of the axiomatic and operational specifications using both manual proof and extensive testing, and develops a SAT-based tool for evaluating possible outcomes of multi-threaded test programs, showing that this tool is significantly more efficient than a tool based on an operational specification.

...read moreread less

Abstract: The growing complexity of hardware optimizations employed by multiprocessors leads to subtle distinctions among allowed and disallowed behaviors, posing challenges in specifying their memory models formally and accurately, and in understanding and analyzing the behavior of concurrent software. This complexity is particularly evident in the IBM® Power Architecture®, for which a faithful specification was published only in 2011 using an operational style. In this paper we present an equivalent axiomatic specification, which is more abstract and concise. Although not officially sanctioned by the vendor, our results indicate that this axiomatic specification provides a reasonable basis for reasoning about current IBM® POWER® multiprocessors. We establish the equivalence of the axiomatic and operational specifications using both manual proof and extensive testing. To demonstrate that the constraint-based style of axiomatic specification is more amenable to computer-aided verification, we develop a SAT-based tool for evaluating possible outcomes of multi-threaded test programs, and we show that this tool is significantly more efficient than a tool based on an operational specification.

...read moreread less

130 citations

Journal Article•DOI•

Watchdog: hardware for safe and secure manual memory management and full memory safety

[...]

Santosh Nagarakatte¹, Milo M. K. Martin¹, Steve Zdancewic¹•Institutions (1)

University of Pennsylvania¹

09 Jun 2012

TL;DR: This paper extends Watchdog's mechanisms to detect bounds errors, thereby providing full hardware-enforced memory safety at low overheads, and streamline the implementation and reduce runtime overhead.

...read moreread less

Abstract: Languages such as C and C++ use unsafe manual memory management, allowing simple bugs (i.e., accesses to an object after deallocation) to become the root cause of exploitable security vulnerabilities. This paper proposes Watchdog, a hardware-based approach for ensuring safe and secure manual memory management. Inspired by prior software-only proposals, Watchdog generates a unique identifier for each memory allocation, associates these identifiers with pointers, and checks to ensure that the identifier is still valid on every memory access. This use of identifiers and checks enables Watchdog to detect errors even in the presence of reallocations. Watchdog stores these pointer identifiers in a disjoint shadow space to provide comprehensive protection and ensure compatibility with existing code. To streamline the implementation and reduce runtime overhead: Watchdog (1) uses micro-ops to access metadata and perform checks, (2) eliminates metadata copies among registers via modified register renaming, and (3) uses a dedicated metadata cache to reduce checking overhead. Furthermore, this paper extends Watchdog's mechanisms to detect bounds errors, thereby providing full hardware-enforced memory safety at low overheads.

...read moreread less

124 citations

Proceedings Article•DOI•

Multicore acceleration of priority-based schedulers for concurrency bug detection

[...]

Santosh Nagarakatte¹, Sebastian Burckhardt², Milo M. K. Martin¹, Madanlal Musuvathi²•Institutions (2)

University of Pennsylvania¹, Microsoft²

11 Jun 2012

TL;DR: NeedlePoint is described, an open-source framework that allows selection and comparison of a wide range of interleaving exploration policies for bug detection proposed by prior work and it is formally prove that parallel PCT provides the same probabilistic coverage guarantees as PCT.

...read moreread less

Abstract: Testing multithreaded programs is difficult as threads can interleave in a nondeterministic fashion. Untested interleavings can cause failures, but testing all interleavings is infeasible. Many interleaving exploration strategies for bug detection have been proposed, but their relative effectiveness and performance remains unclear as they often lack publicly available implementations and have not been evaluated using common benchmarks. We describe NeedlePoint, an open-source framework that allows selection and comparison of a wide range of interleaving exploration policies for bug detection proposed by prior work.Our experience with NeedlePoint indicates that priority-based probabilistic concurrency testing (the PCT algorithm) finds bugs quickly, but it runs only one thread at a time, which destroys parallelism by serializing executions. To address this problem we propose a parallel version of the PCT algorithm~(PPCT). We show that the new algorithm outperforms the original by a factor of 5x when testing parallel programs on an eight-core machine. We formally prove that parallel PCT provides the same probabilistic coverage guarantees as PCT. Moreover, PPCT is the first algorithm that runs multiple threads while providing coverage guarantees.

...read moreread less

68 citations

Patent•

Computational sprinting using multiple cores

[...]

Thomas F. Wenisch¹, Kevin P. Pipe¹, Marios Papaefthymiou¹, Milo M. K. Martin¹, Arun Raghavan¹ - Show less +1 more•Institutions (1)

University of Pennsylvania¹

16 Nov 2012

TL;DR: In this article, a multi-core processing system that uses computational sprinting to generate high levels of computational output for short periods of time at power consumption levels that are not sustainable over a longer period of time due to thermal and/or other constraints is described.

...read moreread less

Abstract: A multi-core processing system that uses computational sprinting to generate high levels of computational output for short periods of time at power consumption levels that are not sustainable over longer periods of time due to thermal and/or other constraints. This is done using a number of processing cores that, when operated simultaneously, utilize available thermal capacity within the system to consume power and produce heat that is in excess of a thermal design power (TDP) of the system, but is tolerable because of the short period of operation. The system and/or method described herein may include thermal capacitors in the form of phase change materials (PCMs), may implement normal, sprint and/or cooling modes of operation, and may employ parallel sprinting, frequency sprinting, sprint pacing and/or sprint- and-rest techniques, to cite several possibilities.

...read moreread less

26 citations

Practical low-overhead enforcement of memory safety for c programs

[...]

Milo M. K. Martin¹, Santosh Nagarakatte¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2012

TL;DR: This dissertation demonstrates the compatibility of a pointer-based approach to provide comprehensive memory safety that works with mostly unmodified C code with a low performance overhead by hardening legacy C/C++ code with minimal source code changes and shows the effectiveness of the approach by detecting new memory safety errors and previously knownMemory safety errors in large code bases.

...read moreread less

Abstract: The serious bugs and security vulnerabilities that result from C's lack of bounds checking and unsafe manual memory management are well known, yet C remains in widespread use. Unfortunately, C's arbitrary pointer arithmetic, conflation of pointers and arrays, and programmer-visible memory layout make retrofitting C with memory safety guarantees challenging. Existing approaches suffer from incompleteness, have high runtime overhead, or require non-trivial changes to the C source code. Thus far, these deficiencies have prevented widespread adoption of such techniques. This dissertation proposes mechanisms to provide comprehensive memory safety that works with mostly unmodified C code with a low performance overhead. We use a pointer-based approach where we maintain metadata with pointers and check every pointer dereference. To enable compatibility with existing code, we maintain the metadata for the pointers in memory in a disjoint metadata space leaving the memory layout of the program intact. For detecting spatial violations, we maintain bounds metadata with every pointer. For detecting temporal violations, we also maintain a unique identifier metadata with each pointer. This pointer metadata is propagated with pointer operations and checked on pointer dereferences. Coupling disjoint metadata with a pointer-based approach enables comprehensive detection of all memory safety violations in unmodified C programs. This dissertation demonstrates the compatibility of this approach by hardening legacy C/C++ code with minimal source code changes. Further, this dissertation shows the effectiveness of the approach by detecting new memory safety errors and previously known memory safety errors in large code bases. To attain low performance overheads, this dissertation proposes efficient instantiations of this approach (1) within a compiler, (2) within hardware, and (3) with a hybrid hardware accelerated compiler instrumentation that reduces the overhead of enforcing memory safety, and thereby enabling their use in deployed systems.

...read moreread less

18 citations

Protocol Design With Concolic Snippets

[...]

Rajeev Alur, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M. K. Martin, Arun Raghavan, Abhishek Udupa - Show less +2 more

01 Jan 2012

TL;DR: This paper proposes a new way to specify protocols using concolic snippets, that is, sample execution fragments that contain both concrete and symbolic values, and describes a prototype implementation for design of cache coherence protocols.

...read moreread less

Abstract: With the maturing of computer-aided verification technology, there is an emerging opportunity to develop design tools that can transform the way systems are designed. In this paper, we propose a new way to specify protocols using concolic snippets, that is, sample execution fragments that contain both concrete and symbolic values. While the purely symbolic extreme is simply an alternative representation of the traditional communicating extended finite-state-machines, and the purely concrete extreme is an instantiation of the “programming by examples” paradigm, our specification language allows the designer to specify the desired protocol using a mixture of symbolic state machines and concrete scenarios. Our synthesis engine generalizes the snippets into a transition function, which is then analyzed using a model checker with respect to high-level temporal-logic correctness requirements. We describe a prototype implementation for design of cache coherence protocols built using (1) a straightforward enumeration of all expressions for transition functions, (2) a check for consistency with respect to concolic snippets using the SMT solver CVC3, and (3) a check for correctness using the model checker Murφ. We discuss our experience in designing classical cache coherence protocols using the proposed methodology.

...read moreread less