Showing papers by "Srinivas Devadas published in 2011"

PDF

Open Access

Book Chapter•DOI•

FPGA-Based true random number generation using circuit metastability with adaptive feedback control

[...]

Mehrdad Majzoobi¹, Farinaz Koushanfar¹, Srinivas Devadas²•Institutions (2)

Rice University¹, Massachusetts Institute of Technology²

28 Sep 2011

TL;DR: A novel and efficient method to generate true random numbers on FPGAs by inducing metastability in bi-stable circuit elements, e.g. flip-flops, by using precise programmable delay lines (PDL) that accurately equalize the signal arrival times to flip-Flops.

...read moreread less

Abstract: The paper presents a novel and efficient method to generate true random numbers on FPGAs by inducing metastability in bi-stable circuit elements, e.g. flip-flops. Metastability is achieved by using precise programmable delay lines (PDL) that accurately equalize the signal arrival times to flip-flops. The PDLs are capable of adjusting signal propagation delays with resolutions higher than fractions of a pico second. In addition, a real time monitoring system is utilized to assure a high degree of randomness in the generated output bits, resilience against fluctuations in environmental conditions, as well as robustness against active adversarial attacks. The monitoring system employs a feedback loop that actively monitors the probability of output bits; as soon as any bias is observed in probabilities, it adjusts the delay through PDLs to return to the metastable operation region. Implementation on Xilinx Virtex 5 FPGAs and results of NIST randomness tests show the effectiveness of our approach.

...read moreread less

144 citations

Proceedings Article•DOI•

Reliable and efficient PUF-based key generation using pattern matching

[...]

Zdenek Paral, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Jun 2011

TL;DR: A novel and efficient method to reliably provision and re-generate a finite and exact sequence of bits, for use with cryptographic applications, e.g., as a key, by employing one or more challengeable Physical Unclonable Function (PUF) circuit elements is described.

...read moreread less

Abstract: We describe a novel and efficient method to reliably provision and re-generate a finite and exact sequence of bits, for use with cryptographic applications, e.g., as a key, by employing one or more challengeable Physical Unclonable Function (PUF) circuit elements. Our method reverses the conventional paradigm of using public challenges to generate secret PUF responses; it exposes response patterns and keeps secret the particular challenges that generate response patterns. The key is assembled from a series of small (initially chosen or random), secret integers, each being an index into a string of bits produced by the PUF circuit(s); a PUF unique pattern at each respective index is then persistently stored between provisioning and all subsequent key re-generations. To obtain the secret integers again, a newly repeated PUF output string is searched for highest-probability matches with the stored patterns. This means that complex error correction logic such as BCH decoders are not required. The method reveals only relatively short PUF output data in public store, thwarting opportunities for modeling attacks. We provide experimental results using data obtained from PUF ASICs, which show that keys can be efficiently and reliably generated using our scheme under extreme environmental variation.

...read moreread less

131 citations

Book Chapter•DOI•

Lightweight and secure PUF key storage using limits of machine learning

[...]

Meng-Day (Mandel) Yu, David M'Raihi, Richard Sowell, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

28 Sep 2011

TL;DR: A lightweight and secure key storage scheme using silicon Physical Unclonable Functions (PUFs) and a lightweight error correction code (ECC) encoder / decoder is described, which is 75% smaller than a previous provably secure implementation, and yet achieves robust environmental performance in 65nm FPGA and 0.13µ ASIC implementations.

...read moreread less

Abstract: A lightweight and secure key storage scheme using silicon Physical Unclonable Functions (PUFs) is described. To derive stable PUF bits from chip manufacturing variations, a lightweight error correction code (ECC) encoder / decoder is used. With a register count of 69, this codec core does not use any traditional error correction techniques and is 75% smaller than a previous provably secure implementation, and yet achieves robust environmental performance in 65nm FPGA and 0.13µ ASIC implementations. The security of the syndrome bits uses a new security argument that relies on what cannot be learned from a machine learning perspective. The number of Leaked Bits is determined for each Syndrome Word, reducible using Syndrome Distribution Shaping. The design is secure from a min-entropy standpoint against a machinelearning-equipped adversary that, given a ceiling of leaked bits, has a classification error bounded by e. Numerical examples are given using latest machine learning results.

...read moreread less

107 citations

Journal Article•DOI•

A method for probing the mutational landscape of amyloid structure

[...]

Charles W. O'Donnell, Jérôme Waldispühl¹, Mieszko Lis¹, Randal Halfmann¹, Srinivas Devadas¹, Susan Lindquist¹, Bonnie Berger¹ - Show less +3 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Jul 2011

TL;DR: AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability and is the only method to predict complete super-secondary structures, enabling accurate discrimination of topologically dissimilar amyloid conformations that correspond to the same sequence locations.

...read moreread less

Abstract: Motivation: Proteins of all kinds can self-assemble into highly ordered β-sheet aggregates known as amyloid fibrils, important both biologically and clinically. However, the specific molecular structure of a fibril can vary dramatically depending on sequence and environmental conditions, and mutations can drastically alter amyloid function and pathogenicity. Experimental structure determination has proven extremely difficult with only a handful of NMR-based models proposed, suggesting a need for computational methods. Results: We present AmyloidMutants, a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability. Tested on non-mutant, full-length amyloid structures with known chemical shift data, AmyloidMutants offers roughly 2-fold improvement in prediction accuracy over existing tools. Moreover, AmyloidMutants is the only method to predict complete super-secondary structures, enabling accurate discrimination of topologically dissimilar amyloid conformations that correspond to the same sequence locations. Applied to mutant prediction, AmyloidMutants identifies a global conformational switch between Aβ and its highly-toxic ‘Iowa’ mutant in agreement with a recent experimental model based on partial chemical shift data. Predictions on mutant, yeast-toxic strains of HET-s suggest similar alternate folds. When applied to HET-s and a HET-s mutant with core asparagines replaced by glutamines (both highly amyloidogenic chemically similar residues abundant in many amyloids), AmyloidMutants surprisingly predicts a greatly reduced capacity of the glutamine mutant to form amyloid. We confirm this finding by conducting mutagenesis experiments. Availability: Our tool is publically available on the web at http://amyloid.csail.mit.edu/. Contact:lindquist_admin@wi.mit.edu; bab@csail.mit.edu Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

64 citations

Proceedings Article•DOI•

Scalable, accurate multicore simulation in the 1000-core era

[...]

Mieszko Lis¹, Pengju Ren², Myong Hyon Cho¹, Keun Sup Shim¹, Christopher W. Fletcher¹, Omer Khan¹, Srinivas Devadas¹ - Show less +3 more•Institutions (2)

Massachusetts Institute of Technology¹, Xi'an Jiaotong University²

10 Apr 2011

TL;DR: HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture, offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy.

...read moreread less

Abstract: We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on 6 separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 11 ×. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, and parameters driving power and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. HORNET can run in network-only mode using synthetic traffic or traces, directly emulate a MIPS-based multicore, or function as the memory subsystem for native applications executed under the Pin instrumentation tool. HORNET is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/.

...read moreread less

64 citations

Patent•

Reliable puf value generation by pattern matching

[...]

Zdenek Paral, Srinivas Devadas

12 Dec 2011

TL;DR: In this article, a method was proposed to reliably provision and re-generate a finite and exact sequence of bits for use with cryptographic applications, e.g., as a key, by employing one or more challengeable PUF circuit elements.

...read moreread less

Abstract: A method is used to reliably provision and re-generate a finite and exact sequence of bits, for use with cryptographic applications, e.g., as a key, by employing one or more challengeable Physical Unclonable Function (PUF) circuit elements. The method reverses the conventional paradigm of using public challenges to generate secret PUF responses; it exposes the response and keeps the particular challenges that generate the response secret.

...read moreread less

47 citations

Proceedings Article•DOI•

Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System

[...]

Michel A. Kinsy¹, Michael Pellauer¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Sep 2011

TL;DR: This work shows different topology configurations of the Heracles system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board, and provides a small MIPS cross-compiler tool chain to assist in developing software for Heracles.

...read moreread less

Abstract: Heracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a fully bypassed, 7-stage pipelined microprocessor running the MIPS-III ISA, a 4-stage input-buffer, virtual-channel router, and a local variable-size shared memory. Our design is highly modular with clear interfaces between the core, the memory hierarchy, and the on-chip network. In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization. The memory system in Heracles can be configured as one single global shared memory (SM), or distributed shared memory (DSM), or any combination thereof. Each core is connected to the rest of the network of processors by a parameterized, realistic, wormhole router. We show different topology configurations of the system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board. We also provide a small MIPS cross-compiler tool chain to assist in developing software for Heracles.

...read moreread less

40 citations

Journal Article•DOI•

Bridging the gap between single-template and fragment based protein structure modeling using Spanner

[...]

Mieszko Lis, Taeho Kim, Jamica Sarmiento, Daisuke Kuroda, Huy Dinh, Akira R. Kinjo, Karlou Mar Amada, Srinivas Devadas, Haruki Nakamura, and Daron M. Standley - Show less +6 more

01 Jan 2011-Immunome Research

TL;DR: The Ig model results suggest that when gap regions represent a significant fraction of the alignment, Spanner’s efficient use of fragment libraries, along with local sequence and secondary structural information, significantly improve model accuracy without a dra-matic increase in computational cost.

...read moreread less

Abstract: Background: As the coverage of experimentally determined protein structures increases, fragment-based structural modeling approaches are expected to play an ever more important role in structural modeling. Here we introduce a structural modeling method by which an initial structural template can be extended by the addition of structural fragments to more closely match an aligned query sequence. A database of pro-tein fragments indexed by their internal coordinates was created and a novel methodology for their retrieval was implemented. After fragment selection and assembly, sidechains are replaced and the all-atom model is refined by restrained energy minimization. We implemented the proposed method in the program Span-ner and benchmarked it using a previously published set of 367 immunoglobulin (Ig) loops, 206 historical query-template pairs and alignments from the Critical Assessment of protein Structure Prediction (CASP) experiment, and 217 structural alignments between remotely homologous query-template pairs. The con-straint-based modeling software MODELLER and previously reported results for RosettaAntibody, were used as references. Results: The error in the modeled structures was assessed by root-mean square deviation (RMSD) from the native structure, as a function of the query-template sequence identity. For the Ig benchmark set, for which a single fragment was used to model each loop, the average RMSD for Spanner (3 +/- 1.5 A) was found to lie midway between that of MODELLER (4 +/- 2 A) and RosettaAntibody (2 +/- 1 A). For the CASP and structural alignment benchmarks, for which gaps represent a small fraction of the modeled residues, the difference between Spanner and MODELLER were much smaller then the standard deviations of either program. The Spanner web server and source code are available at http://sysimm.ifrec.osaka-u.ac.jp/Spanner/. Conclusions: For typical homology modeling, Spanner is at least as good, on average as the template-free constraint-driven approach used by MODELLER. The Ig model results suggest that when gap regions represent a significant fraction of the alignment, Spanner’s efficient use of fragment libraries, along with local sequence and secondary structural information, significantly improve model accuracy without a dra-matic increase in computational cost.

...read moreread less

26 citations

Proceedings Article•DOI•

Time-Predictable Computer Architecture for Cyber-Physical Systems: Digital Emulation of Power Electronics Systems

[...]

Michel A. Kinsy¹, Omer Khan¹, Ivan Celanovic¹, Dusan Majstorovic², Nikola Celanovic², Srinivas Devadas¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Novi Sad²

29 Nov 2011

TL;DR: The design and application of a new ultrahigh speed real-time emulation platform for Hardware-in-the-Loop (HiL) testing and design of high-power power electronics systems, based on a reconfigurable, heterogeneous, multicore processor architecture that emulates power electronics, and includes a circuit compiler that translates graphic system models into processor executable machine code.

...read moreread less

Abstract: The smart grid concept is a good example of a complex cyber-physical system (CPS) that exhibits intricate interplay between control, sensing, and communication infrastructure on one side, and power processing and actuation on the other side. The more extensive use of computation, sensing, and communication, tightly coupled with power processing, calls for a fundamental reassessment of some of the prevailing paradigms in the real-time control and communication abstractions. Today these abstractions are mostly thought of as embedded systems, and the overall framework needs to be reformed in order to fully realize the potential of the emerging field of cyber-physical systems. This paper details the design and application of a new ultrahigh speed real-time emulation platform for Hardware-in-the-Loop (HiL) testing and design of high-power power electronics systems. Our real-time hardware emulation for HiL systems is based on a reconfigurable, heterogeneous, multicore processor architecture that emulates power electronics, and includes a circuit compiler that translates graphic system models into processor executable machine code. We present the hardware architecture, and describe the process of power electronic circuit compilation. This approach yields real-time execution on the order of 1µs simulation time step (including input/output latency) for a broad class of power electronics converters. To the best of our knowledge, no current academic or industrial HiL system has such a fast emulation response time. We present HiL experimental results for three representative systems: a variable speed induction motor drive, a utility grid connected photovoltaic converter system, and a hybrid electric vehicle motor drive.

...read moreread less

24 citations

Proceedings Article•DOI•

Memory coherence in the age of multicores

[...]

Mieszko Lis¹, Keun Sup Shim¹, Myong Hyon Cho¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

09 Oct 2011

TL;DR: Two new schemes that guarantee coherent shared memory without the complexity and overheads of a cache coherence protocol are described, namely execution migration and library caches coherence.

...read moreread less

Abstract: As we enter an era of exascale multicores, the question of efficiently supporting a shared memory model has become of paramount importance. On the one hand, programmers demand the convenience of coherent shared memory; on the other, growing core counts place higher demands on the memory subsystem and increasing on-chip distances mean that interconnect delays are becoming a significant part of memory access latencies. In this article, we first review the traditional techniques for providing a shared memory abstraction at the hardware level in multicore systems. We describe two new schemes that guarantee coherent shared memory without the complexity and overheads of a cache coherence protocol, namely execution migration and library cache coherence. We compare these approaches using an analytical model based on average memory latency, and give intuition for the strengths and weaknesses of each. Finally, we describe hybrid schemes that combine the strengths of different schemes.

...read moreread less

24 citations

Proceedings Article•DOI•

Deadlock-free fine-grained thread migration

[...]

Myong Hyon Cho¹, Keun Sup Shim¹, Mieszko Lis¹, Omer Khan¹, Srinivas Devadas¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

01 May 2011

TL;DR: This study introduces the Exclusive Native Context (ENC) protocol, a general, provably deadlock-free migration protocol for instruction-level thread migration architectures, and shows that ENC offers performance within 11.7% of an idealized deadlocked migration protocol with infinite resources.

...read moreread less

Abstract: Several recent studies have proposed fine-grained, hardware-level thread migration in multicores as a solution to power, reliability, and memory coherence problems. The need for fast thread migration has been well documented, however, a fast, deadlock-free migration protocol is sorely lacking: existing solutions either deadlock or are too slow and cumbersome to ensure performance with frequent, fine-grained thread migrations. In this study, we introduce the Exclusive Native Context (ENC) protocol, a general, provably deadlock-free migration protocol for instruction-level thread migration architectures. Simple to implement, ENC does not require additional hardware beyond common migration-based architectures. Our evaluation using synthetic migrations and the SPLASH-2 application suite shows that ENC offers performance within 11.7% of an idealized deadlock-free migration protocol with infinite resources.

...read moreread less

Proceedings Article•DOI•

Directoryless shared memory coherence using execution migration

[...]

Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Omer Khan, Srinivas Devadas - Show less +1 more

01 Dec 2011-Parallel and distributed computing and systems

TL;DR: This work argues that with EM scaling performance has much lower cost and design complexity than in directorybased coherence and traditional NUCA architectures: by merely scaling network bandwidth from 256 to 512 bit flits, the performance of the architecture improves by an additional 13%, while the baselines show negligible improvement.

...read moreread less

Abstract: We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architectures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance comparable to directory-based architectures without using directories: avoiding automatic data replication significantly reduces cache miss rates, while a fast network-level thread migration scheme takes advantage of shared data locality to reduce remote cache accesses that limit traditional NUCA performance. EM area and energy consumption are very competitive, and, on the average, it outperforms a directory-based MOESI baseline by 1.3 and a traditional S-NUCA design by 1.2 . We argue that with EM scaling performance has much lower cost and design complexity than in directorybased coherence and traditional NUCA architectures: by merely scaling network bandwidth from 256 to 512 bit flits, the performance of our architecture improves by an additional 13%, while the baselines show negligible improvement.

...read moreread less

Proceedings Article•DOI•

Security challenges and opportunities in adaptive and reconfigurable hardware

[...]

Victor Costan¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Jun 2011

TL;DR: This work proposes augmenting regular cloud servers with a Trusted Computation Base (TCB) that can securely perform high-performance computations and achieves cost savings by spreading functionality across two paired chips.

...read moreread less

Abstract: We present a novel approach to building hardware support for providing strong security guarantees for computations running in the cloud (shared hardware in massive data centers), while maintaining the high performance and low cost that make cloud computing attractive in the first place. We propose augmenting regular cloud servers with a Trusted Computation Base (TCB) that can securely perform high-performance computations. Our TCB achieves cost savings by spreading functionality across two paired chips. We show that making a Field-Programmable Gate Array (FPGA) a part of the TCB benefits security and performance, and we explore a new method for defending the computation inside the TCB against side-channel attacks.

...read moreread less

Proceedings Article•DOI•

Brief announcement: distributed shared memory based on computation migration

[...]

Mieszko Lis¹, Keun Sup Shim¹, Myong Hyon Cho¹, Christopher W. Fletcher¹, Michel A. Kinsy¹, Ilia Lebedev¹, Omer Khan¹, Srinivas Devadas¹ - Show less +4 more•Institutions (1)

Massachusetts Institute of Technology¹

04 Jun 2011

TL;DR: A large number of microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years, but how will memory architectures scale and how will next-generation multicores be programmed?

...read moreread less

Abstract: Driven by increasingly unbalanced technology scaling and power dissipation limits, microprocessor designers have resorted to increasing the number of cores on a single chip, and pundits expect 1000-core designs to materialize in the next few years [1]. But how will memory architectures scale and how will these next-generation multicores be programmed? One barrier to scaling current memory architectures is the offchip memory bandwidth wall [1,2]: off-chip bandwidth grows with package pin density, which scales much more slowly than on-die transistor density [3]. To reduce reliance on external memories and keep data on-chip, today’s multicores integrate very large shared last-level caches on chip [4]; interconnects used with such shared caches, however, do not scale beyond relatively few cores, and the power requirements and access latencies of large caches exclude their use in chips on a 1000-core scale. For massive-scale multicores, then, we are left with relatively small per-core caches. Per-core caches on a 1000-core scale, in turn, raise the question of memory coherence. On the one hand, a shared memory abstraction is a practical necessity for general-purpose programming, and most programmers prefer a shared memory model [5]. On the other hand, ensuring coherence among private caches is an expensive proposition: bus-based and snoopy protocols don’t scale beyond relatively few cores, and directory sizes needed in cache-coherence protocols must equal a significant portion of the combined size of the per-core caches as otherwise directory evictions will limit performance [6]. Moreover, directory-based coherence protocols are notoriously difficult to implement and verify [7].

...read moreread less

Library Cache Coherence

[...]

Keun Sup Shim, Myong Hyon Cho, Mieszko Lis, Omer Khan, Srinivas Devadas - Show less +1 more

02 May 2011

TL;DR: Library Cache Coherence is presented, which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements, and has 1.85X less average memory latency than a MESI directory-based protocol on a set of benchmarks, even with a simple timestamp choosing algorithm.

...read moreread less

Abstract: Directory-based cache coherence is a popular mechanism for chip multiprocessors and multicores. The directory protocol, however, requires multicast for invalidation messages and the collection of acknowledgement messages, which can be expensive in terms of latency and network traffic. Furthermore, the size of the directory increases with the number of cores. We present Library Cache Coherence (LCC), which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements. A library is a set of timestamps that are used to auto-invalidate shared cache lines, and delay writes on the lines until all shared copies expire. The size of library is independent of the number of cores. By removing the complex invalidation process of directorybased cache coherence protocols, LCC generates fewer network messages. At the same time, LCC also allows reads on a cache block to take place while a write to the block is being delayed, without breaking sequential consistency. As a result, LCC has 1.85X less average memory latency than a MESI directory-based protocol on our set of benchmarks, even with a simple timestamp choosing algorithm; moreover, our experimental results on LCC with an ideal timestamp scheme (though not implementable) show the potential of further improvement for LCC with more sophisticated timestamp schemes.

...read moreread less

Journal Article•DOI•

Efficient traversal of beta-sheet protein folding pathways using ensemble models.

[...]

Solomon Shenker¹, Charles W. O'Donnell², Srinivas Devadas², Bonnie Berger², Jérôme Waldispühl¹ - Show less +1 more•Institutions (2)

McGill University¹, Massachusetts Institute of Technology²

29 Sep 2011-Journal of Computational Biology

TL;DR: The program tFolder is introduced as an efficient method for modelling the folding process of large β-sheet proteins using sequence data alone and the accuracy of t folder is demonstrated to be comparable with state-of-the-art methods designed specifically for the contact prediction problem alone.

...read moreread less

Abstract: Molecular dynamics (MD) simulations can now predict ms-timescale folding processes of small proteins; however, this presently requires hundreds of thousands of CPU hours and is primarily applicable to short peptides with few long-range interactions. Larger and slower-folding proteins, such as many with extended β-sheet structure, would require orders of magnitude more time and computing resources. Furthermore, when the objective is to determine only which folding events are necessary and limiting, atomistic detail MD simulations can prove unnecessary. Here, we introduce the program tFolder as an efficient method for modelling the folding process of large β-sheet proteins using sequence data alone. To do so, we extend existing ensemble β-sheet prediction techniques, which permitted only a fixed anti-parallel β-barrel shape, with a method that predicts arbitrary β-strand/β-strand orientations and strand-order permutations. By accounting for all partial and final structural states, we can then model the transition from random coil to native state as a Markov process, using a master equation to simulate population dynamics of folding over time. Thus, all putative folding pathways can be energetically scored, including which transitions present the greatest barriers. Since correct folding pathway prediction is likely determined by the accuracy of contact prediction, we demonstrate the accuracy of tFolder to be comparable with state-of-the-art methods designed specifically for the contact prediction problem alone. We validate our method for dynamics prediction by applying it to the folding pathway of the well-studied Protein G. With relatively very little computation time, tFolder is able to reveal critical features of the folding pathways which were only previously observed through time-consuming MD simulations and experimental studies. Such a result greatly expands the number of proteins whose folding pathways can be studied, while the algorithmic integration of ensemble prediction with Markovian dynamics can be applied to many other problems.

...read moreread less

Proceedings Article•DOI•

ARCc: A case for an architecturally redundant cache-coherence architecture for large multicores

[...]

Omer Khan¹, Henry Hoffmann², Mieszko Lis², Farrukh Hijaz¹, Anant Agarwal², Srinivas Devadas² - Show less +2 more•Institutions (2)

University of Massachusetts Lowell¹, Massachusetts Institute of Technology²

09 Oct 2011

TL;DR: An online analytical model is presented implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity, which delivers up to 1.6× higher performance and up to lower energy consumption compared to the directory-based counterpart.

...read moreread less

Abstract: This paper proposes an architecturally redundant cache-coherence architecture (ARCc) that combines the directory and shared-NUCA based coherence protocols to improve performance, energy and dependability. Both coherence mechanisms co-exist in the hardware and ARCc enables seamless transition between the two protocols. We present an online analytical model implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity. The ARCc architecture delivers up to 1.6× higher performance and up to 1.5× lower energy consumption compared to the directory-based counterpart. It does so by identifying applications which benefit from the large shared cache capacity of shared-NUCA because of lower off-chip accesses, or where remote-cache word accesses are efficient.

...read moreread less

Patent•

Soft message signing

[...]

William Henry Bares, Srinivas Devadas, Vivek Khandelwal, Zdenek Paral, Richard Sowell, Tonghang Zhou - Show less +2 more

09 May 2011

TL;DR: In this paper, a message is signed using a PUF without having to exactly regenerate a cryptographic key, and another party that shares information about the PUF is able to verify the signature to a high degree of accuracy.

...read moreread less

Abstract: A message is signed using a PUF without having to exactly regenerate a cryptographic key. Another party that shares information about the PUF is able to verify the signature to a high degree of accuracy (i.e., high probability of rejection of a forged signature and a low probably of false rejection of a true signature). In some examples, the information shared by a recipient of a message signature includes a parametric model of operational characteristics of the PUF used to form the signature.

...read moreread less

Journal Article•DOI•

DCC: A Dependable Cache Coherence Multicore Architecture

[...]

Omer Khan¹, Mieszko Lis¹, Yildiz Sinangil¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2011-IEEE Computer Architecture Letters

TL;DR: A dependable cache coherence architecture (DCC) is proposed that combines the traditional directory protocol with a novel execution-migration-based architecture to ensure dependability that is transparent to the programmer.

...read moreread less

Abstract: Cache coherence lies at the core of functionally-correct operation of shared memory multicores. Traditional directory-based hardware coherence protocols scale to large core counts, but they incorporate complex logic and directories to track coherence states. Technology scaling has reached miniaturization levels where manufacturing imperfections, device unreliability and occurrence of hard errors pose a serious dependability challenge. Broken or degraded functionality of the coherence protocol can lead to a non-operational processor or user visible performance loss. In this paper, we propose a dependable cache coherence architecture (DCC) that combines the traditional directory protocol with a novel execution-migration-based architecture to ensure dependability that is transparent to the programmer. Our architecturally redundant execution migration architecture only permits one copy of data to be cached anywhere in the processor: when a thread accesses an address not locally cached on the core it is executing on, it migrates to the appropriate core and continues execution there. Both coherence mechanisms can co-exist in the DCC architecture and we present architectural extensions to seamlessly transition between the directory and execution migration protocols.

...read moreread less

High-speed real-time digital emulation for hardware-in-the-loop testing of power electronics: A new paradigm in the field of electronic design automation (EDA) for power electronics systems

[...]

Michel A. Kinsy¹, Dusan Majstorovic, Pierre Haessig, Jason Poon¹, Nikola Celanovic, Ivan Celanovic¹, Srinivas Devadas¹ - Show less +3 more•Institutions (1)

Massachusetts Institute of Technology¹

01 May 2011

TL;DR: This paper details the design and application of a new ultra-high speed real-time emulation for Hardware-in-the-Loop (HiL) testing and design of high-power power electronics systems based on a custom, heterogeneous, reconfigurable, multicore processor design that emulates power electronics, and includes a circuit compiler that translates graphic system models into processor executable machine code.

...read moreread less

Abstract: This paper details the design and application of a new ultra-high speed real-time emulation for Hardware-in-the-Loop (HiL) testing and design of high-power power electronics systems. Our real-time hardware emulation for HiL system is based on a custom, heterogeneous, reconfigurable, multicore processor design that emulates power electronics, and includes a circuit compiler that translates graphic system models into processor executable machine code. We present the digital processor architecture, and describe the process of power electronic circuit compilation. This approach yields real-time execution on the order of 1μs simulation time step (including input/output latency) for a broad class of power electronics converters. We present HiL experimental results for three representative systems: a variable speed induction motor drive, a utility grid connected photovoltaic converter system, and a hybrid electric vehicle motor drive.

...read moreread less

Journal Article•

Can we trust the chips of the future

[...]

Mohammad Tehranipour, Srinivas Devadas, K. Gotze, Farinaz Koushanfar, Miodrag Potkonjak, Ingrid Verbauwhede, David Yeh - Show less +3 more

01 Jan 2011-IEEE Design & Test of Computers

TL;DR: This roundtable is based on the topic of hardware security and trust, which was the focus of the IEEE International Symposium on Hardware-Oriented Security and Trust (HOST 2011) held with the 2011 Design Automation Conference.

...read moreread less

Abstract: This roundtable is based on the topic of hardware security and trust, which was the focus of the IEEE International Symposium on Hardware-Oriented Security and Trust (HOST 2011) held with the 2011 Design Automation Conference.

...read moreread less

Dissertation•

Ensemble modeling of beta-sheet proteins

[...]

Srinivas Devadas¹, Bonnie Berger¹, Susan Lindquist¹, Charles W. O'Donnell¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2011

TL;DR: New algorithms that enable the efficient modeling of protein structure ensembles and their sequence variants enable the identification of all energetically likely sequence/structure states for a family of proteins and advanced structure prediction, mutational analysis, and sequence alignment are introduced.

...read moreread less

Abstract: Our ability to characterize protein structure and dynamics is vastly outpaced by the speed of modern genetic sequencing, creating a growing divide between our knowledge of biological sequence and structure. Structural modeling algorithms offer the hope to bridge this gap through computational exploration of the sequence determinants of structure diversity. In this thesis, we introduce new algorithms that enable the efficient modeling of protein structure ensembles and their sequence variants. These statistical mechanics-based constructions enable the identification of all energetically likely sequence/structure states for a family of proteins. Beyond improved structure predictions, this approach enables a framework for thermodynamically-driven mutational and comparative analysis as well as the approximation of kinetic protein folding pathways. We have applied these techniques to two protein types that are notoriously difficult to characterize biochemically: transmembrane β-barrel proteins and amyloid fibrils. For these we advance the state-of-the-art in structure prediction, mutational analysis, and sequence alignment. Further, we have collaborated to apply these methods to open scientific questions about amyloid fibrils and bacterial biofilms. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

...read moreread less

Book Chapter•DOI•

Efficient traversal of beta-sheet protein folding pathways using ensemble models

[...]

Solomon Shenker¹, Charles W. O'Donnell², Srinivas Devadas², Bonnie Berger², Jérôme Waldispühl¹ - Show less +1 more•Institutions (2)

McGill University¹, Massachusetts Institute of Technology²

28 Mar 2011

...read moreread less

Abstract: Molecular Dynamics (MD) simulations can now predict mstimescale folding processes of small proteins -- however, this presently requires hundreds of thousands of CPU hours and is primarily applicable to short peptides with few long-range interactions. Larger and slowerfolding proteins, such as many with extended β-sheet structure, would require orders of magnitude more time and computing resources. Furthermore, when the objective is to determine only which folding events are necessary and limiting, atomistic detail MD simulations can prove unnecessary. Here, we introduce the program tFolder as an efficient method for modelling the folding process of large β-sheet proteins using sequence data alone. To do so, we extend existing ensemble β-sheet prediction techniques, which permitted only a fixed anti-parallel β-barrel shape, with a method that predicts arbitrary β-strand/β-strand orientations and strand-order permutations. By accounting for all partial and final structural states, we can then model the transition from random coil to native state as a Markov process, using a master equation to simulate population dynamics of folding over time. Thus, all putative folding pathways can be energetically scored, including which transitions present the greatest barriers. Since correct folding pathway prediction is likely determined by the accuracy of contact prediction, we demonstrate the accuracy of tFolder to be comparable with state-of-the-art methods designed specifically for the contact prediction problem alone. We validate our method for dynamics prediction by applying it to the folding pathway of the well-studied Protein G. With relatively very little computation time, tFolder is able to reveal critical features of the folding pathways which were only previously observed through time-consuming MD simulations and experimental studies. Such a result greatly expands the number of proteins whose folding pathways can be studied, while the algorithmic integration of ensemble prediction with Markovian dynamics can be applied to many other problems.

...read moreread less