Showing papers by "Srinivas Devadas published in 2013"

PDF

Open Access

Proceedings Article•DOI•

Path ORAM: an extremely simple oblivious RAM protocol

[...]

Emil Stefanov¹, Marten van Dijk², Elaine Shi³, Christopher W. Fletcher⁴, Ling Ren⁴, Xiangyao Yu⁴, Srinivas Devadas⁴ - Show less +3 more•Institutions (4)

University of California, Berkeley¹, University of Connecticut², University of Maryland, College Park³, Massachusetts Institute of Technology⁴

04 Nov 2013

TL;DR: Path ORAM as discussed by the authors is the most practical oblivious RAM protocol for small client storage known to date, which requires log 2 N / log X bandwidth overhead for block size B = X log N. Path ORAM has been adopted in the design of secure processors since its proposal.

...read moreread less

Abstract: We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme for small client storage known to date. We formally prove that Path ORAM requires log^2 N / log X bandwidth overhead for block size B = X log N. For block sizes bigger than Omega(log^2 N), Path ORAM is asymptotically better than the best known ORAM scheme with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.

...read moreread less

676 citations

Journal Article•DOI•

PUF Modeling Attacks on Simulated and Silicon Data

[...]

Ulrich Rührmair¹, Jan Sölter², Frank Sehnke, Xiaolin Xu³, Ahmed Mahmoud¹, Vera Stoyanova, Gideon Dror, Jürgen Schmidhuber⁴, Wayne Burleson³, Srinivas Devadas⁵ - Show less +6 more•Institutions (5)

Technische Universität München¹, Free University of Berlin², University of Massachusetts Amherst³, SUPSI⁴, Massachusetts Institute of Technology⁵

01 Nov 2013-IEEE Transactions on Information Forensics and Security

TL;DR: Numerical modeling attacks on several proposed strong physical unclonable functions (PUFs) are discussed, leading to new design requirements for secure electrical Strong PUFs, and will be useful to PUF designers and attackers alike.

...read moreread less

Abstract: We discuss numerical modeling attacks on several proposed strong physical unclonable functions (PUFs). Given a set of challenge-response pairs (CRPs) of a Strong PUF, the goal of our attacks is to construct a computer algorithm which behaves indistinguishably from the original PUF on almost all CRPs. If successful, this algorithm can subsequently impersonate the Strong PUF, and can be cloned and distributed arbitrarily. It breaks the security of any applications that rest on the Strong PUF's unpredictability and physical unclonability. Our method is less relevant for other PUF types such as Weak PUFs. The Strong PUFs that we could attack successfully include standard Arbiter PUFs of essentially arbitrary sizes, and XOR Arbiter PUFs, Lightweight Secure PUFs, and Feed-Forward Arbiter PUFs up to certain sizes and complexities. We also investigate the hardness of certain Ring Oscillator PUF architectures in typical Strong PUF applications. Our attacks are based upon various machine learning techniques, including a specially tailored variant of logistic regression and evolution strategies. Our results are mostly obtained on CRPs from numerical simulations that use established digital models of the respective PUFs. For a subset of the considered PUFs-namely standard Arbiter PUFs and XOR Arbiter PUFs-we also lead proofs of concept on silicon data from both FPGAs and ASICs. Over four million silicon CRPs are used in this process. The performance on silicon CRPs is very close to simulated CRPs, confirming a conjecture from earlier versions of this work. Our findings lead to new design requirements for secure electrical Strong PUFs, and will be useful to PUF designers and attackers alike.

...read moreread less

463 citations

Posted Content•

PUF Modeling Attacks on Simulated and Silicon Data.

[...]

Ulrich Rührmair¹, Jan Sölter, Frank Sehnke, Xiaolin Xu, Ahmed Mahmoud, Vera Stoyanova, Gideon Dror, Jürgen Schmidhuber, Wayne Burleson, Srinivas Devadas - Show less +6 more•Institutions (1)

Technische Universität München¹

01 Jan 2013-IACR Cryptology ePrint Archive

TL;DR: In this article, numerical modeling attacks on several PUFs are discussed. But the authors focus on strong PUFs, and do not consider weak PUFs such as XOR Arbiter PUFs and Lightweight Secure PUFs.

...read moreread less

Abstract: We discuss numerical modeling attacks on several proposed Strong Physical Unclonable Functions (PUFs). Given a set of challenge-response pairs (CRPs) of a Strong PUF, the goal of our attacks is to construct a computer algorithm which behaves indistinguishably from the original PUF on almost all CRPs. If successful, this algorithm can subsequently impersonate the Strong PUF, and can be cloned and distributed arbitrarily. It breaks the security of any applications that rest on the Strong PUF’s unpredictability and physical unclonability. Our method is less relevant for other PUF types such as Weak PUFs; see Section I-B for a detailed discussion of this topic. The Strong PUFs that we could attack successfully include standard Arbiter PUFs of essentially arbitrary sizes, and XOR Arbiter PUFs, Lightweight Secure PUFs, and Feed-Forward Arbiter PUFs up to certain sizes and complexities. We also investigate the hardness of certain Ring Oscillator PUF architectures in typical Strong PUF applications. Our attacks are based upon various machine learning techniques, including a specially tailored variant of Logistic Regression and Evolution Strategies. Our results are mostly obtained on CRPs from numerical simulations that use established digital models of the respective PUFs. For a subset of the considered PUFs — namely standard Arbiter PUFs and XOR Arbiter PUFs — we also lead proofs of concept on silicon data from both FPGAs and ASICs. Over four million silicon CRPs are used in this process. The performance on silicon CRPs is very close to simulated CRPs, confirming a conjecture from earlier versions of this work. Our findings lead to new design requirements for secure electrical Strong PUFs, and will be useful to PUF designers and attackers alike.

...read moreread less

318 citations

Path ORAM: An Extremely Simple Oblivious RAM Protocol

[...]

Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher W. Fletcher, Ling Ren, Xiangyao Yu, Srinivas Devadas - Show less +3 more

01 Nov 2013

TL;DR: It is formally proved that Path ORAM requires log^2 N / log X bandwidth overhead for block size B = X log N, and is asymptotically better than the best known ORAM scheme with small client storage.

...read moreread less

Abstract: National Science Foundation (U.S.). Graduate Research Fellowship Program (Grant DGE-0946797)

...read moreread less

183 citations

Proceedings Article•DOI•

Design space exploration and optimization of path oblivious RAM in secure processors

[...]

Ling Ren¹, Xiangyao Yu¹, Christopher W. Fletcher¹, Marten van Dijk¹, Srinivas Devadas¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

23 Jun 2013

Abstract: Keeping user data private is a huge problem both in cloud computing and computation outsourcing. One paradigm to achieve data privacy is to use tamper-resistant processors, inside which users' private data is decrypted and computed upon. These processors need to interact with untrusted external memory. Even if we encrypt all data that leaves the trusted processor, however, the address sequence that goes off-chip may still leak information. To prevent this address leakage, the security community has proposed ORAM (Oblivious RAM). ORAM has mainly been explored in server/file settings which assume a vastly different computation model than secure processors. Not surprisingly, naively applying ORAM to a secure processor setting incurs large performance overheads.In this paper, a recent proposal called Path ORAM is studied. We demonstrate techniques to make Path ORAM practical in a secure processor setting. We introduce background eviction schemes to prevent Path ORAM failure and allow for a performance-driven design space exploration. We propose a concept called super blocks to further improve Path ORAM's performance, and also show an efficient integrity verification scheme for Path ORAM. With our optimizations, Path ORAM overhead drops by 41.8%, and SPEC benchmark execution time improves by 52.4% in relation to a baseline configuration. Our work can be used to improve the security level of previous secure processors.

...read moreread less

118 citations

Posted Content•

Design Space Exploration and Optimization of Path Oblivious RAM in Secure Processors.

[...]

Ling Ren, Xiangyao Yu, Christopher W. Fletcher¹, Marten van Dijk, Srinivas Devadas - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2013-IACR Cryptology ePrint Archive

TL;DR: Path ORAM as mentioned in this paper is a secure processor that uses hidden memory to prevent address leakage in the off-chip, but it is not secure in the secure processor setting and it cannot be used in cloud computing.

...read moreread less

Abstract: Keeping user data private is a huge problem both in cloud computing and computation outsourcing. One paradigm to achieve data privacy is to use tamper-resistant processors, inside which users’ private data is decrypted and computed upon. These processors need to interact with untrusted external memory. Even if we encrypt all data that leaves the trusted processor, however, the address sequence that goes off-chip may still leak information. To prevent this address leakage, the security community has proposed ORAM (Oblivious RAM). ORAM has mainly been explored in server/file settings which assume a vastly different computation model than secure processors. Not surprisingly, naively applying ORAM to a secure processor setting incurs large performance overheads. In this paper, a recent proposal called Path ORAM is studied. We demonstrate techniques to make Path ORAM practical in a secure processor setting. We introduce background eviction schemes to prevent Path ORAM failure and allow for a performance-driven design space exploration. We propose a concept called super blocks to further improve Path ORAM’s performance, and also show an efficient integrity verification scheme for Path ORAM. With our optimizations, Path ORAM overhead drops by 41.8%, and SPEC benchmark execution time improves by 52.4% in relation to a baseline configuration. Our work can be used to improve the security level of previous secure processors.

...read moreread less

88 citations

Proceedings Article•DOI•

Heracles: a tool for fast RTL-based design space exploration of multicore processors

[...]

Michel A. Kinsy¹, Michael Pellauer², Srinivas Devadas¹•Institutions (2)

Massachusetts Institute of Technology¹, Intel²

11 Feb 2013

TL;DR: This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit, designed with a high degree of modularity to support fast exploration of future multicore processors of dierent topologies, routing schemes, processing elements, and memory system organizations.

...read moreread less

Abstract: This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile research and teaching tool for architectural exploration and hardware-software co-design. The Heracles toolkit comprises the soft hardware (HDL) modules, application compiler, and graphical user interface. It is designed with a high degree of modularity to support fast exploration of future multicore processors of dierent topologies, routing schemes, processing elements (cores), and memory system organizations. It is a component-based framework with parameterized interfaces and strong emphasis on module reusability. The compiler toolchain is used to map C or C++ based applications onto the processing units. The GUI allows the user to quickly congure and launch a system instance for easy factorial development and evaluation. Hardware modules are implemented in synthesizable Verilog and are FPGA platform independent. The Heracles tool is freely available under the open-source MIT license at: http://projects.csail.mit.edu/heracles.

...read moreread less

61 citations

Proceedings Article•DOI•

Integrity verification for path Oblivious-RAM

[...]

Ling Ren¹, Christopher W. Fletcher¹, Xiangyao Yu¹, Marten van Dijk¹, Srinivas Devadas¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

21 Nov 2013

TL;DR: This paper proposes an efficient integrity verification layer for Path ORAM, which only imposes 17% latency overhead, and shows that integrity verification is vital to maintaining privacy for recursive Path OrAMs under active adversaries.

...read moreread less

Abstract: Oblivious-RAMs (ORAM) are used to hide memory access patterns. Path ORAM has gained popularity due to its efficiency and simplicity. In this paper, we propose an efficient integrity verification layer for Path ORAM, which only imposes 17% latency overhead. We also show that integrity verification is vital to maintaining privacy for recursive Path ORAMs under active adversaries.

...read moreread less

42 citations

The locality-aware adaptive cache coherence protocol

[...]

George Kurian¹, Omer Khan², Srinivas Devadas¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Connecticut²

01 Jun 2013

TL;DR: The U.S. Defense Advanced Research Projects Agency (DARPA) as discussed by the authors was the first to propose the Ubiquitous High Performance Computing (UHPC) program.

...read moreread less

Abstract: United States. Defense Advanced Research Projects Agency. The Ubiquitous High Performance Computing Program

...read moreread less

37 citations

Proceedings Article•DOI•

The locality-aware adaptive cache coherence protocol

[...]

George Kurian¹, Omer Khan², Srinivas Devadas¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Connecticut²

23 Jun 2013

TL;DR: This work proposes a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines, and relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line.

...read moreread less

Abstract: Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.

...read moreread less

35 citations

Journal Article•DOI•

Optimal and Heuristic Application-Aware Oblivious Routing

[...]

Michel A. Kinsy¹, Myong Hyon Cho¹, Keun Sup Shim¹, Mieszko Lis¹, G.E. Suh², Srinivas Devadas¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Cornell University²

01 Jan 2013-IEEE Transactions on Computers

TL;DR: A framework for application-aware routing that assures deadlock freedom under one or more virtual channels by forcing routes to conform to an acyclic channel dependence graph is presented and it is shown that it is possible to achieve better performance than traditional deterministic and oblivious routing schemes on popular synthetic benchmarks using the bandwidth-sensitive approach.

...read moreread less

Abstract: Conventional oblivious routing algorithms do not take into account resource requirements (e.g., bandwidth, latency) of various flows in a given application. As they are not aware of flow demands that are specific to the application, network resources can be poorly utilized and cause serious local congestion. Also, flows, or packets, may share virtual channels in an undetermined way; the effects of head-of-line blocking may result in throughput degradation. In this paper, we present a framework for application-aware routing that assures deadlock freedom under one or more virtual channels by forcing routes to conform to an acyclic channel dependence graph. In addition, we present methods to statically and efficiently allocate virtual channels to flows or packets, under oblivious routing, when there are two or more virtual channels per link. Using the application-aware routing framework, we develop and evaluate a bandwidth-sensitive oblivious routing scheme that statically determines routes considering an application's communication characteristics. Given bandwidth estimates for flows, we present a mixed integer-linear programming (MILP) approach and a heuristic approach for producing deadlock-free routes that minimize maximum channel load. Our framework can be used to produce application-aware routes that target the minimization of latency, number of flows through a link, bandwidth, or any combination thereof. Our results show that it is possible to achieve better performance than traditional deterministic and oblivious routing schemes on popular synthetic benchmarks using our bandwidth-sensitive approach. We also show that, when oblivious routing is used and there are more flows than virtual channels per link, the static assignment of virtual channels to flows can help mitigate the effects of head-of-line blocking, which may impede packets that are dynamically competing for virtual channels. We experimentally explore the performance tradeoffs of static and dynamic virtual channel allocation on bandwidth-sensitive and traditional oblivious routing methods.

...read moreread less

Proceedings Article•DOI•

Generalized external interaction with tamper-resistant hardware with bounded information leakage

[...]

Xiangyao Yu¹, Christopher W. Fletcher¹, Ling Ren¹, Marten van Dijk², Srinivas Devadas¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Connecticut²

08 Nov 2013

TL;DR: Stream-Ascend significantly improves the generality and efficiency of Ascend in supporting many applications that fit into a streaming model, while maintaining the same security level, and is able to achieve a very high security level with small overheads for a large class of applications.

...read moreread less

Abstract: This paper investigates secure ways to interact with tamper-resistant hardware leaking a strictly bounded amount of information. Architectural support for the interaction mechanisms is studied and performance implications are evaluated.The interaction mechanisms are built on top of a recently-proposed secure processor Ascend[ascend-stc12]. Ascend is chosen because unlike other tamper-resistant hardware systems, Ascend completely obfuscates pin traffic through the use of Oblivious RAM (ORAM) and periodic ORAM accesses. However, the original Ascend proposal, with the exception of main memory, can only communicate with the outside world at the beginning or end of program execution; no intermediate information transfer is allowed.Our system, Stream-Ascend, is an extension of Ascend that enables intermediate interaction with the outside world. Stream-Ascend significantly improves the generality and efficiency of Ascend in supporting many applications that fit into a streaming model, while maintaining the same security level.Simulation results show that with smart scheduling algorithms, the performance overhead of Stream-Ascend relative to an insecure and idealized baseline processor is only 24.5%, 0.7%, and 3.9% for a set of streaming benchmarks in a large dataset processing application. Stream-Ascend is able to achieve a very high security level with small overheads for a large class of applications.

...read moreread less

Seec: a framework for self-aware management of goals and constraints in computing systems (power-aware computing, accuracy-aware computing, adaptive computing, autonomic computing)

[...]

Anant Agarwal¹, Srinivas Devadas¹, Henry Hoffman¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2013

TL;DR: This thesis describes the SEEC framework and evaluates it in several case studies, demonstrating that SEEC can have a positive impact on real systems by understanding high level goals and adapting to meet those goals online.

...read moreread less

Abstract: Modern computing systems require applications to balance competing goals, e.g., high performance and low power or high performance and high precision. Achieving the right balance for a particular application and system places an unrealistic burden on application programmers who must understand the power, performance, and precision implications of a variety of application and system configurations (e.g. ,changing algorithms or allocating cores). To address this problem, we propose the Self-aware Computing framework, or SEEC. SEEC automatically and dynamically configures systems and applications to meet goals accurately and efficiently. While other self-aware implementations have been proposed, SEEC is uniquely distinguished by its decoupled approach, which allows application and systems programmers to separately specify goals and configurations, each according to their expertise. SEEC's runtime decision engine observes and configures the system automatically, reducing programmer burden. This general and extensible decision engine employs both control theory and machine learning to reason about previously unseen applications and system configurations while automatically adapting to changes in both application and system behavior. This thesis describes the SEEC framework and evaluates it in several case studies. SEEC is evaluated by implementing its interfaces and runtime system on multiple, modern Linux x86 servers. Applications are then instrumented to emit goals and progress, while system services are instrumented to describe available adaptations. The SEEC runtime decision engine is then evaluated for its ability to meet goals accurately and efficiently. For example, SEEC is shown to meet performance goals with less than 3% average error while bringing average power consumption within 92% of optimal. SEEC is also shown to meet power goals with less than 2% average error while achieving over 96% of optimal performance on average. Additional studies show SEEC reacting to maintain performance in response to unexpected events including fluctuations in application workload and reduction in available resources. These studies demonstrate that SEEC can have a positive impact on real systems by understanding high level goals and adapting to meet those goals online. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs@mit.edu)

...read moreread less

Security and Reliability Properties of Syndrome Coding Techniques Used in PUF Key Generation

[...]

Meng-Day (Mandel) Yu, David M'Raihi, Srinivas Devadas, Ingrid Verbauwhede

01 Jan 2013

TL;DR: This work separates PUF stability algorithms into three Syndrome coding methods: Code-Offset; Index-Based Syndrome; Pattern Vector; and analyzes and compares these methods with a focus on security and reliability properties, including a comparison of relevant security assumptions as well as a comparison to ASIC PUF reliability data.

...read moreread less

Abstract: A Physical Unclonable Function (PUF) uniquely identifies identically manufactured silicon devices. To derive keys, a stability algorithm is required. Unlike conventional error correction used in communication systems, a PUF stability algorithm has a dual mandate of accounting for environmental noise while minimally disclosing keying material; the latter, security, aspect is generally not a concern for conventional error correction use cases. For the purpose of comparison, we classify PUF stability algorithms into three Syndrome coding methods: Code-Offset; Index-Based Syndrome; Pattern Vector. We analyze and compare these methods with a focus on security and reliability properties, including a comparison of relevant security assumptions as well as a comparison of relevant ASIC PUF reliability data.

...read moreread less

Posted Content•

Path ORAM: An Extremely Simple Oblivious RAM Protocol.

[...]

Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher W. Fletcher, Ling Ren, Xiangyao Yu, Srinivas Devadas - Show less +3 more

01 Jan 2013-IACR Cryptology ePrint Archive

TL;DR: Path ORAM as discussed by the authors is a simple oblivious RAM protocol with a small amount of client storage, which has a O(log N) bandwidth cost for blocks of size B = Ω(logN) bits.

...read moreread less

Abstract: We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(logN) bandwidth cost for blocks of size B = Ω(logN) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.

...read moreread less

Proceedings Article•DOI•

Hardware-level thread migration in a 110-core shared-memory multiprocessor

[...]

Mieszko Lis¹, Keun Sup Shim¹, Brandon Cho, Ilia Lebedev¹, Srinivas Devadas¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 2013

TL;DR: This work significantly reduces traffic on high-locality workloads up to 14x reduction in traffic in some benchmarks and automatically mapping allocation over cores not a trivial problem.

...read moreread less

Abstract: • Advantages - significantly reduces traffic on high-locality workloads up to 14x reduction in traffic in some benchmarks - simple to implement and verify (indep. of core count, no transient states) - decentralized & trivially scalable (only # core ID bits, addr ↔ core mapping) • Challenges - workloads should be optimized with memory model in mind (like allocating data on cache line boundaries but more coarse-grained) - automatically mapping allocation over cores not a trivial problem • Opportunities - fine-grained migration is an enabling technology - since it's cheap and responsive, can be used for almost anything - e.g., if only some cores have FPUs, migrate to access FPU

...read moreread less

Proceedings Article•DOI•

Authenticated storage using small trusted hardware

[...]

Hsin-Jung Yang¹, Victor Costan¹, Nickolai Zeldovich¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

08 Nov 2013

TL;DR: The proposed design achieves significantly higher throughput than previous designs by parallelizing server-side authentication operations and permitting the untrusted server to maintain caches and schedule disk writes, while enforcing precise crash recovery and write access control.

...read moreread less

Abstract: A major security concern with outsourcing data storage to third-party providers is authenticating the integrity and freshness of data. State-of-the-art software-based approaches require clients to maintain state and cannot immediately detect forking attacks, while approaches that introduce limited trusted hardware (e.g., a monotonic counter) at the storage server achieve low throughput. This paper proposes a new design for authenticating data storage using a small piece of high-performance trusted hardware attached to an untrusted server. The proposed design achieves significantly higher throughput than previous designs. The server-side trusted hardware allows clients to authenticate data integrity and freshness without keeping any mutable client-side state. Our design achieves high performance by parallelizing server-side authentication operations and permitting the untrusted server to maintain caches and schedule disk writes, while enforcing precise crash recovery and write access control.

...read moreread less

Journal Article•DOI•

Toward a Coherent Multicore Memory Model

[...]

Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 2013-IEEE Computer

TL;DR: With exascale multicores, the question of how to efficiently support a shared memory model is of paramount importance as ever-growing core counts place higher demands on memory subsystems, and increasing on-chip distances mean that interconnect delays exert a significant effect on memory access latencies.

...read moreread less

Abstract: With exascale multicores, the question of how to efficiently support a shared memory model is of paramount importance. As programmers demand the convenience of coherent shared memory, ever-growing core counts place higher demands on memory subsystems, and increasing on-chip distances mean that interconnect delays exert a significant effect on memory access latencies.

...read moreread less

Design space exploration and optimization of path oblivious RAM in secure processors

[...]

Ling Ren, Xiangyao Yu, Christopher W. Fletcher¹, Marten van Dijk, Srinivas Devadas - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 2013

TL;DR: Path ORAM as discussed by the authors is a secure processor that uses hidden memory to prevent address leakage in the off-chip, but it is not secure in the secure processor setting and it cannot be used in cloud computing.

...read moreread less

Abstract: Keeping user data private is a huge problem both in cloud computing and computation outsourcing. One paradigm to achieve data privacy is to use tamper-resistant processors, inside which users’ private data is decrypted and computed upon. These processors need to interact with untrusted external memory. Even if we encrypt all data that leaves the trusted processor, however, the address sequence that goes off-chip may still leak information. To prevent this address leakage, the security community has proposed ORAM (Oblivious RAM). ORAM has mainly been explored in server/file settings which assume a vastly different computation model than secure processors. Not surprisingly, naively applying ORAM to a secure processor setting incurs large performance overheads. In this paper, a recent proposal called Path ORAM is studied. We demonstrate techniques to make Path ORAM practical in a secure processor setting. We introduce background eviction schemes to prevent Path ORAM failure and allow for a performance-driven design space exploration. We propose a concept called super blocks to further improve Path ORAM’s performance, and also show an efficient integrity verification scheme for Path ORAM. With our optimizations, Path ORAM overhead drops by 41.8%, and SPEC benchmark execution time improves by 52.4% in relation to a baseline configuration. Our work can be used to improve the security level of previous secure processors.

...read moreread less

Proceedings Article•DOI•

Design tradeoffs for simplicity and efficient verification in the Execution Migration Machine

[...]

Keun Sup Shim¹, Mieszko Lis¹, Myong Hyon Cho¹, Ilia Lebedev¹, Srinivas Devadas¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

07 Nov 2013

TL;DR: The recently taped-out 110-core shared-memory processor, the Execution Migration Machine (EM2), establishes a new design point and significantly improves performance and energy over remote-access-only designs by exploiting data locality at remote cores via fast hardware-level thread migration.

...read moreread less

Abstract: As transistor technology continues to scale, the architecture community has experienced exponential growth in design complexity and significantly increasing implementation and verification costs. Moreover, Moore's law has led to a ubiquitous trend of an increasing number of cores on a single chip. Often, these large-core-count chips provide a shared memory abstraction via directories and coherence protocols, which have become notoriously error-prone and difficult to verify because of subtle data races and state space explosion. Although a very simple hardware shared memory implementation can be achieved by simply not allowing ad-hoc data replication and relying on remote accesses for remotely cached data (i.e., requiring no directories or coherence protocols), such remote-access-based directoryless architectures cannot take advantage of any data locality, and therefore suffer in both performance and energy. Our recently taped-out 110-core shared-memory processor, the Execution Migration Machine (EM2), establishes a new design point. On the one hand, EM2 supports shared memory but does not automatically replicate data, and thus preserves the simplicity of directoryless architectures. On the other hand, it significantly improves performance and energy over remote-access-only designs by exploiting data locality at remote cores via fast hardware-level thread migration. In this paper, we describe the design choices made in the EM2 chip as well as our choice of design methodology, and discuss how they combine to achieve design simplicity and verification efficiency. Even though EM2 is a fairly large design-110 cores using a total of 357 million transistors-the entire chip design and implementation process (RTL, verification, physical design, tapeout) took only 18 man-months.

...read moreread less

Proceedings Article•DOI•

MARTHA: architecture for control and emulation of power electronics and smart grid systems

[...]

Michel A. Kinsy¹, Ivan Celanovic¹, Omer Khan², Srinivas Devadas¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Connecticut²

18 Mar 2013

TL;DR: This paper presents a novel Multicore Architecture for Real-Time Hybrid Applications (MARTHA) with time-predictable execution, low computational latency, and high performance that meets the requirements for control, emulation and estimation of next-generation power electronics and smart grid systems.

...read moreread less

Abstract: This paper presents a novel Multicore Architecture for Real-Time Hybrid Applications (MARTHA) with time-predictable execution, low computational latency, and high performance that meets the requirements for control, emulation and estimation of next-generation power electronics and smart grid systems. Generic general-purpose architectures running real-time operating systems (RTOS) or quality of service (QoS) schedulers have not been able to meet the hard real-time constraints required by these applications. We present a framework based on switched hybrid automata for modeling power electronics applications. Our approach allows a large class of power electronics circuits to be expressed as switched hybrid models which can be executed on a single hardware platform.

...read moreread less

Proceedings Article•DOI•

A framework to accelerate sequential programs on homogeneous multicores

[...]

Christopher W. Fletcher¹, Rachael Harding¹, Omer Khan², Srinivas Devadas¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Connecticut²

25 Nov 2013

TL;DR: This work contributes two insights: that the dynamic optimization process is highly insensitive to runtime factors in homogeneous multicores and that the Partner core's view of application hot paths can be noisy, allowing the entire optimization process to be implemented with very little dedicated hardware in a multicore.

...read moreread less

Abstract: This paper presents a light-weight dynamic optimization framework for homogeneous multicores Our system profiles applications at runtime to detect hot program paths, and offloads the optimization of these paths to a Partner core Our work contributes two insights: (1) that the dynamic optimization process is highly insensitive to runtime factors in homogeneous multicores and (2) that the Partner core's view of application hot paths can be noisy, allowing the entire optimization process to be implemented with very little dedicated hardware in a multicore

...read moreread less

On-chip networks for manycore architecture

[...]

Srinivas Devadas¹, Myong Hyon Cho¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2013

TL;DR: In this article, the authors present three techniques for improving the efficiency of on-chip interconnects: path-based, randomized, oblivious, and minimal routing, and BAN (bandwidth adaptive networks).

...read moreread less

Abstract: Over the past decade, increasing the number of cores on a single processor has successfully enabled continued improvements of computer performance. Further scaling these designs to tens and hundreds of cores, however, still presents a number of hard problems, such as scalability, power efficiency and effective programming models. A key component of manycore systems is the on-chip network, which faces increasing efficiency demands as the number of cores grows. In this thesis, we present three techniques for improving the efficiency of on-chip interconnects. First, we present PROM (Path-based, Randomized, Oblivious, and Minimal routing) and BAN (Bandwidth Adaptive Networks), techniques that offer efficient intercore communication for bandwidth-constrained networks. Next, we present ENC (Exclusive Native Context), the first deadlock-free, fine-grained thread migration protocol developed for on-chip networks. ENC demonstrates that a simple and elegant technique in the on-chip network can provide critical functional support for higher-level application and system layers. Finally, we provide a realistic context by sharing our hands-on experience in the physical implementation of the on-chip network for the Execution Migration Machine, an ENC-based 110-core processor fabricated in 45nm ASIC technology. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs@mit.edu)

...read moreread less

Journal Article•DOI•

Let's stop trusting software with our sensitive data

[...]

Christopher W. Fletcher¹, M. van Dijk¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Jun 2013-IEEE Design & Test of Computers

TL;DR: The paper states that people are trusting the cloud more and more to perform sensitive operations, and the Ascend processor attempts to achieve these goals; the only entity that the client has to trust is the processor itself.

...read moreread less

Abstract: The paper states that people are trusting the cloud more and more to perform sensitive operations. Demanding more trust in software systems is a recipe for disaster. Suppose the people only trust hardware manufacturers and cryptographers, and not system software developers, application programmers, or other software vendors. It will be the hardware manufacturer's job to produce a piece of hardware that provides some security properties. These properties will correspond to cryptographic operations being implemented correctly in the hardware and adding a modicum of physical security. The beauty of hardware is that its functionality is fixed. If we design our systems to only depend on hardware properties, then we need not worry about software changes or patches introducing new security holes-inevitable in current systems. How can it ensure privacy of data despite the practically infinite number of malicious programs out there? The Ascend processor attempts to achieve these goals; the only entity that the client has to trust is the processor itself.

...read moreread less

Patent•

Method,integrated circuit,and computer program product for signal generator based device security

[...]

Gookwon Edward Suh, Srinivas Devadas

21 Nov 2013