Showing papers by "Srinivas Devadas published in 2017"

PDF

Open Access

Proceedings Article•DOI•

Catena: Efficient Non-equivocation via Bitcoin

[...]

Alin Tomescu¹, Srinivas Devadas¹•Institutions (1)

22 May 2017

TL;DR: Catena enables any number of thin clients, such as mobile phones, to efficiently agree on a log of application-specific statements managed by an adversarial server, and increases the bandwidth requirements of log auditors from 90GB to only tens of megabytes.

...read moreread less

Abstract: We present Catena, an efficiently-verifiable Bitcoinwitnessing scheme. Catena enables any number of thin clients, such as mobile phones, to efficiently agree on a log of application-specific statements managed by an adversarial server. Catenaimplements a log as an OP_RETURN transaction chain andprevents forks in the log by leveraging Bitcoin's security againstdouble spends. Specifically, if a log server wants to equivocate ithas to double spend a Bitcoin transaction output. Thus, Catenalogs are as hard to fork as the Bitcoin blockchain: an adversarywithout a large fraction of the network's computational powercannot fork Bitcoin and thus cannot fork a Catena log either. However, different from previous Bitcoin-based work, Catenadecreases the bandwidth requirements of log auditors from 90GB to only tens of megabytes. More precisely, our clients onlyneed to download all Bitcoin block headers (currently less than35 MB) and a small, 600-byte proof for each statement in a block. We implement Catena in Java using the bitcoinj library and use itto extend CONIKS, a recent key transparency scheme, to witnessits public-key directory in the Bitcoin blockchain where it can beefficiently verified by auditors. We show that Catena can securemany systems today, such as public-key directories, Tor directoryservers and software transparency schemes.

...read moreread less

130 citations

Proceedings Article•DOI•

A Formal Foundation for Secure Remote Execution of Enclaves

[...]

Pramod Subramanyan¹, Rohit Sinha¹, Ilia Lebedev², Srinivas Devadas², Sanjit A. Seshia¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

30 Oct 2017

TL;DR: This work introduces a verification methodology based on a trusted abstract platform (TAP), a formalization of idealized enclave platforms along with a parameterized adversary, and formalizes the notion of secure remote execution.

...read moreread less

Abstract: Recent proposals for trusted hardware platforms, such as Intel SGX and the MIT Sanctum processor, offer compelling security features but lack formal guarantees. We introduce a verification methodology based on a trusted abstract platform (TAP), a formalization of idealized enclave platforms along with a parameterized adversary. We also formalize the notion of secure remote execution and present machine-checked proofs showing that the TAP satisfies the three key security properties that entail secure remote execution: integrity, confidentiality and secure measurement. We then present machine-checked proofs showing that SGX and Sanctum are refinements of the TAP under certain parameterizations of the adversary, demonstrating that these systems implement secure enclaves for the stated adversary models.

...read moreread less

113 citations

Journal Article•DOI•

Trapdoor Computational Fuzzy Extractors and Stateless Cryptographically-Secure Physical Unclonable Functions

[...]

Charles Herder¹, Ling Ren¹, Marten van Dijk², Meng-Day (Mandel) Yu, Srinivas Devadas¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Connecticut²

01 Jan 2017-IEEE Transactions on Dependable and Secure Computing

TL;DR: A fuzzy extractor whose security can be reduced to the hardness of Learning Parity with Noise (LPN) and can efficiently correct a constant fraction of errors in a biometric source with a “noise-avoiding trapdoor” is presented.

...read moreread less

Abstract: We present a fuzzy extractor whose security can be reduced to the hardness of Learning Parity with Noise (LPN) and can efficiently correct a constant fraction of errors in a biometric source with a “noise-avoiding trapdoor.” Using this computational fuzzy extractor, we present a stateless construction of a cryptographically-secure Physical Unclonable Function. Our construct requires no non-volatile (permanent) storage, secure or otherwise, and its computational security can be reduced to the hardness of an LPN variant under the random oracle model. The construction is “stateless,” because there is no information stored between subsequent queries, which mitigates attacks against the PUF via tampering. Moreover, our stateless construction corresponds to a PUF whose outputs are free of noise because of internal error-correcting capability, which enables a host of applications beyond authentication. We describe the construction, provide a proof of computational security, analysis of the security parameter for system parameter choices, and present experimental evidence that the construction is practical and reliable under a wide environmental range.

...read moreread less

85 citations

Proceedings Article•DOI•

Atom: Horizontally Scaling Strong Anonymity

[...]

Albert Kwon¹, Henry Corrigan-Gibbs², Srinivas Devadas¹, Bryan Ford³•Institutions (3)

Massachusetts Institute of Technology¹, Stanford University², École Polytechnique Fédérale de Lausanne³

14 Oct 2017

TL;DR: Atom as mentioned in this paper is an anonymous messaging system that protects against traffic-analysis attacks, where each server touches only a small fraction of the total messages routed through the network, and the system's capacity scales nearlinearly with the number of servers.

...read moreread less

Abstract: Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all honest users of the system, even against an active adversary who monitors the entire network, a portion of the system's servers, and any number of malicious users. The architectural ideas behind Atom have been known in theory, but putting them into practice requires new techniques for (1) avoiding heavy general-purpose multi-party computation protocols, (2) defeating active attacks by malicious servers at minimal performance cost, and (3) handling server failure and churn. Atom is most suitable for sending a large number of short messages, as in a microblogging application or a high-security communication bootstrapping ("dialing") for private messaging systems. We show that, on a heterogeneous network of 1,024 servers, Atom can transit a million Tweet-length messages in 28 minutes. This is over 23x faster than prior systems with similar privacy guarantees.

...read moreread less

84 citations

Proceedings Article•DOI•

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation

[...]

Xiangyao Yu¹, Christopher J. Hughes², Nadathur Satish², Onur Mutlu³, Srinivas Devadas¹ - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Intel², ETH Zurich³

14 Oct 2017

TL;DR: Banshee is a new DRAM cache design that optimizes for both in-package and off-package DRAM bandwidth efficiency without degrading access latency and reduces unnecessary DRAM caches replacement traffic with a new bandwidth-aware frequency-based replacement policy.

...read moreread less

Abstract: Placing the DRAM in the same package as a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs optimize mainly for cache hit latency and do not consider bandwidth efficiency as a first-class design constraint. Hence, as we show in this paper, these designs are suboptimal for use with in-package DRAM.We propose a new DRAM cache design, Banshee, that optimizes for both in-package and off-package DRAM bandwidth efficiency without degrading access latency. Banshee is based on two key ideas. First, it eliminates the tag lookup overhead by tracking the contents of the DRAM cache using TLBs and page table entries, which is efficiently enabled by a new lightweight TLB coherence protocol we introduce. Second, it reduces unnecessary DRAM cache replacement traffic with a new bandwidth-aware frequency-based replacement policy. Our evaluations show that Banshee significantly improves performance (15% on average) and reduces DRAM traffic (35.8% on average) over the best-previous latency-optimized DRAM cache design.CCS CONCEPTS•Computersystemsorganization → Multicore architectures; {\it Heterogeneous (hybrid) systems;

...read moreread less

69 citations

Posted Content•

Efficient Synchronous Byzantine Consensus

[...]

Ittai Abraham, Srinivas Devadas, Danny Dolev, Kartik Nayak, Ling Ren - Show less +1 more

07 Apr 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work improves the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption and presents a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case.

...read moreread less

Abstract: We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption. We present a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case. The key challenge is to ensure quorum intersection at one honest replica. Our solution is to rely on the synchrony assumption to form a post-commit quorum of size $2f+1$, which intersects at $f+1$ replicas with any pre-commit quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in expected 8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24 rounds. Our protocols may be applied to build Byzantine fault tolerant systems or improve cryptographic protocols such as cryptocurrencies when synchrony can be assumed.

...read moreread less

36 citations

Book Chapter•DOI•

Bandwidth Hard Functions for ASIC Resistance

[...]

Ling Ren¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

12 Nov 2017

TL;DR: In this paper, the authors proposed the notion of bandwidth hard functions to reduce an ASIC's energy advantage, which can be used to reduce the energy consumption of the ASIC. But, the authors also pointed out that the memory hardness approach is an incomplete solution, since it only attempts to provide resistance to the area advantage but overlooks the more important energy advantage.

...read moreread less

Abstract: Cryptographic hash functions have wide applications including password hashing, pricing functions for spam and denial-of-service countermeasures and proof of work in cryptocurrencies. Recent progress on ASIC (Application Specific Integrated Circuit) hash engines raise concerns about the security of the above applications. This leads to a growing interest in ASIC resistant hash function and ASIC resistant proof of work schemes, i.e., those that do not give ASICs a huge advantage. The standard approach towards ASIC resistance today is through memory hard functions or memory hard proof of work schemes. However, we observe that the memory hardness approach is an incomplete solution. It only attempts to provide resistance to an ASIC’s area advantage but overlooks the more important energy advantage. In this paper, we propose the notion of bandwidth hard functions to reduce an ASIC’s energy advantage. CPUs cannot compete with ASICs for energy efficiency in computation, but we can rely on memory accesses to reduce an ASIC’s energy advantage because energy costs of memory accesses are comparable for ASICs and CPUs. We propose a model for hardware energy cost that has sound foundations in practice. We then analyze the bandwidth hardness property of ASIC resistant candidates. We find scrypt, Catena-BRG and Balloon are bandwidth hard with suitable parameters. Lastly, we observe that a capacity hard function is not necessarily bandwidth hard, with a stacked double butterfly graph being a counterexample.

...read moreread less

36 citations

Journal Article•DOI•

FPGA Implementation of a Cryptographically-Secure PUF Based on Learning Parity with Noise

[...]

Chenglu Jin, Charles Herder, Ling Ren, Phuong Ha Nguyen, Benjamin Fuller, Srinivas Devadas, Marten van Dijk - Show less +3 more

09 Dec 2017

TL;DR: The main insight is that “confidence information” does not need to be kept private, if the noise vector is independent of the confidence information, e.g., the bits generated by ring oscillator pairs which are physically placed close to each other.

...read moreread less

Abstract: Herder et al. (IEEE Transactions on Dependable and Secure Computing, 2017) designed a new computational fuzzy extractor and physical unclonable function (PUF) challenge-response protocol based on the Learning Parity with Noise (LPN) problem. The protocol requires no irreversible state updates on the PUFs for security, like burning irreversible fuses, and can correct for significant measurement noise when compared to PUFs using a conventional (information theoretical secure) fuzzy extractor. However, Herder et al. did not implement their protocol. In this paper, we give the first implementation of a challenge response protocol based on computational fuzzy extractors. Our main insight is that “confidence information” does not need to be kept private, if the noise vector is independent of the confidence information, e.g., the bits generated by ring oscillator pairs which are physically placed close to each other. This leads to a construction which is a simplified version of the design of Herder et al. (also building on a ring oscillator PUF). Our simplifications allow for a dramatic reduction in area by making a mild security assumption on ring oscillator physical obfuscated key output bits.

...read moreread less

29 citations

Book•

Secure Processors Part I: Background, Taxonomy for Secure Enclaves and Intel SGX Architecture

[...]

Victor Costan¹, Ilia Lebedev¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

13 Jul 2017

TL;DR: Secure Processors Part I: Background, Taxonomy for Secure Enclaves and IntelSGX Architecture and Intel SGX Architecture

...read moreread less

Abstract: Secure Processors Part I: Background, Taxonomy for Secure Enclaves and Intel SGX Architecture

...read moreread less

27 citations

Book•

Secure Processors Part II: Intel SGX Security Analysis and MIT Sanctum Architecture

[...]

Victor Costan¹, Ilia Lebedev¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

13 Jul 2017

TL;DR: The MIT Sanctum processor developed by the authors is introduced: a system designed to offer stronger security guarantees, lend itself better to analysis and formal verification, and offer a more straightforward and complete threat model than the Intel system, all with an equivalent programming model.

...read moreread less

Abstract: This manuscript is the second in a two part survey and analysis of the state of the art in secure processor systems, with a specific focus on remote software attestation and software isolation. The first part established the taxonomy and prerequisite concepts relevant to an examination of the state of the art in trusted remote computation: attested software isolation containers (enclaves). This second part extends Part I’s description of Intel’s Software Guard Extensions (SGX), an available and documented enclave-capable system, with a rigorous security analysis of SGX as a system for trusted remote computation. This part documents the authors’ concerns over the shortcomings of SGX as a secure system and introduces the MIT Sanctum processor developed by the authors: a system designed to offer stronger security guarantees, lend itself better to analysis and formal verification, and offer a more straightforward and complete threat model than the Intel system, all with an equivalent programming model. This two part work advocates a principled, transparent, and wellscrutinized approach to system design, and argues that practical guarantees of privacy and integrity for remote computation are achievable at a reasonable design cost and performance overhead.

...read moreread less

21 citations

Posted Content•

Practical Synchronous Byzantine Consensus

[...]

Ling Ren, Kartik Nayak, Ittai Abraham, Srinivas Devadas

07 Apr 2017-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: In this paper, the authors improved the Byzantine fault tolerance to 2f+1$ by utilizing the synchrony assumption and proposed a synchronous Byzantine state machine replication and Byzantine agreement in synchronous and authenticated setting.

...read moreread less

Abstract: We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance to $n=2f+1$ by utilizing the synchrony assumption. The key challenge is to ensure a quorum intersection at one \emph{honest} replica. Our solution is to rely on the synchrony assumption to form a \emph{post-commit} quorum of size $2f+1$, which intersects at $f+1$ replicas with any \emph{pre-commit} quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in fewer rounds than the best existing solution (Katz and Koo, 2006). A challenge in this direction is to handle non-simultaneous termination, which we solve by introducing a notion of \emph{virtual} participation after termination. Our protocols may be applied to build practical synchronous Byzantine fault tolerant systems and improve cryptographic protocols such as secure multiparty computation and cryptocurrencies when synchrony can be assumed.

...read moreread less

Proceedings Article•DOI•

Brief Announcement: Practical Synchronous Byzantine Consensus

[...]

Ittai Abraham¹, Srinivas Devadas², Kartik Nayak³, Ling Ren²•Institutions (3)

Hebrew University of Jerusalem¹, Massachusetts Institute of Technology², University of Maryland, College Park³

01 Jan 2017

TL;DR: This paper improves the Byzantine fault tolerance to n = 2f + 1 by utilizing the synchrony assumption and solves synchronous authenticated Byzantine agreement in fewer expected rounds than the best existing solution.

...read moreread less

Abstract: This paper presents new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The PBFT state machine replication protocol tolerates f Byzantine faults in an asynchronous setting using n = 3f + 1 replicas. We improve the Byzantine fault tolerance to n = 2f + 1 by utilizing the synchrony assumption. Our protocol also solves synchronous authenticated Byzantine agreement in fewer expected rounds than the best existing solution (Katz and Koo, 2006).

...read moreread less

Journal Article•DOI•

Pervasive, dynamic authentication of physical items

[...]

Meng-Day (Mandel) Yu, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

24 Mar 2017-Communications of The ACM

Book Chapter•DOI•

On Iterative Collision Search for LPN and Subset Sum

[...]

Srinivas Devadas¹, Ling Ren¹, Hanshen Xiao¹•Institutions (1)

Massachusetts Institute of Technology¹

12 Nov 2017

TL;DR: In this paper, the authors present rigorous analysis for the single-list pair-wise iterative collision search method and its applications in subset sum and learning parity with noise (LPN) problems.

...read moreread less

Abstract: Iterative collision search procedures play a key role in developing combinatorial algorithms for the subset sum and learning parity with noise (LPN) problems. In both scenarios, the single-list pair-wise iterative collision search finds the most solutions and offers the best efficiency. However, due to its complex probabilistic structure, no rigorous analysis for it appears to be available to the best of our knowledge. As a result, theoretical works often resort to overly constrained and sub-optimal iterative collision search variants in exchange for analytic simplicity. In this paper, we present rigorous analysis for the single-list pair-wise iterative collision search method and its applications in subset sum and LPN. In the LPN literature, the method is known as the LF2 heuristic. Besides LF2, we also present rigorous analysis of other LPN solving heuristics and show that they work well when combined with LF2. Putting it together, we significantly narrow the gap between theoretical and heuristic algorithms for LPN.

...read moreread less

Proceedings Article•DOI•

Leveraging Hardware Isolation for Process Level Access Control & Authentication

[...]

Syed Kamran Haider¹, Hamza Omar¹, Ilia Lebedev², Srinivas Devadas², Marten van Dijk¹ - Show less +1 more•Institutions (2)

University of Connecticut¹, Massachusetts Institute of Technology²

07 Jun 2017

TL;DR: This framework introduces productive software design guidelines which enable a guarded environment to execute sensitive policy checking code - hence enforcing application control flow integrity - and afford flexibility to the application designer to construct appropriate high-level policies to customize policy checker software.

...read moreread less

Abstract: Critical resource sharing among multiple entities in a processing system is inevitable, which in turn calls for the presence of appropriate authentication and access control mechanisms. Generally speaking, these mechanisms are implemented via trusted software "policy checkers" that enforce certain high level application-specific "rules" to enforce a policy. Whether implemented as operating system modules or embedded inside the application ad hoc, these policy checkers expose additional attack surface in addition to the application logic. In order to protect application software from an adversary, modern secure processing platforms, such as Intel's Software Guard Extensions (SGX), employ principled hardware isolation to offer secure software containers or enclaves to execute trusted sensitive code with some integrity and privacy guarantees against a privileged software adversary. We extend this model further and propose using these hardware isolation mechanisms to shield the authentication and access control logic essential to policy checker software. While relying on the fundamental features of modern secure processors, our framework introduces productive software design guidelines which enable a guarded environment to execute sensitive policy checking code - hence enforcing application control flow integrity - and afford flexibility to the application designer to construct appropriate high-level policies to customize policy checker software.

...read moreread less

Journal Article•

Public Key Cryptosystems with Noisy Secret Keys.

[...]

Charles Herder, Benjamin Fuller, Marten van Dijk, Srinivas Devadas

01 Jan 2017-IACR Cryptology ePrint Archive

TL;DR: This work directly construct public-key encryption and digital signature algorithms with noisy keys based on a weaker model of grading encoding, and uses the computational fuzzy vault to construct the first reusable fuzzy extractor supporting a linear fraction of errors.

...read moreread less

Abstract: Passwords bootstrap symmetric and asymmetric cryptography, tying keys to an individual user. Biometrics are intended to strengthen this tie. Unfortunately, biometrics exhibit noise between repeated readings. Fuzzy extractors (Dodis et al., Eurocrypt 2004) derive stable symmetric keys from noisy sources. We ask if it is also possible for noisy sources to directly replace private keys in asymmetric cryptosystems. We propose a new primitive called public-key cryptosystems with noisy keys. Such a cryptosystem functions when the private key varies according to some metric. An intuitive solution is to combine a fuzzy extractor with a public key cryptosystem. Unfortunately, fuzzy extractors need static helper information to account for noise. This helper information creates fundamental limitations on the resulting cryptosytems. To overcome these limitations, we directly construct public-key encryption and digital signature algorithms with noisy keys. The core of our constructions is a computational version of the fuzzy vault (Juels and Sudan, Designs, Codes, and Cryptography 2006). Security of our schemes is based on graded encoding schemes (Garg et al., Eurocrypt 2013, Garg et al., TCC 2016). Importantly, our public-key encryption algorithm is based on a weaker model of grading encoding. If functional encryption or indistinguishable obfuscation exist in this weaker model, they also exist in the standard model. In addition, we use the computational fuzzy vault to construct the first reusable fuzzy extractor (Boyen, CCS 2004) supporting a linear fraction of errors.

...read moreread less

Posted Content•

Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation

[...]

Xiangyao Yu¹, Christopher J. Hughes², Nadathur Satish², Onur Mutlu³, Srinivas Devadas¹ - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Intel², ETH Zurich³

10 Apr 2017-arXiv: Hardware Architecture

TL;DR: Banshee as mentioned in this paper proposes a new DRAM cache design, Banshee, that optimizes for both in-and off-package DRAM bandwidth efficiency without degrading access latency, which is a first-class design constraint.

...read moreread less

Abstract: Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs mainly optimize for hit latency and do not consider off-chip bandwidth efficiency as a first-class design constraint. Hence, as we show in this paper, these designs are suboptimal for use with in-package DRAM. We propose a new DRAM cache design, Banshee, that optimizes for both in- and off-package DRAM bandwidth efficiency without degrading access latency. The key ideas are to eliminate the in-package DRAM bandwidth overheads due to costly tag accesses through virtual memory mechanism and to incorporate a bandwidth-aware frequency-based replacement policy that is biased to reduce unnecessary traffic to off-package DRAM. Our extensive evaluation shows that Banshee provides significant performance improvement and traffic reduction over state-of-the-art latency-optimized DRAM cache designs.

...read moreread less

Journal Article•DOI•

PriviPK: Certificate-less and secure email communication

[...]

Mashael AlSabah¹, Alin Tomescu², Ilia Lebedev², Dimitrios Serpanos³, Srinivas Devadas² - Show less +1 more•Institutions (3)

Qatar Computing Research Institute¹, Massachusetts Institute of Technology², University of Patras³

01 Sep 2017-Computers & Security

TL;DR: PriviPK uniquely combines important privacy properties such as forward secrecy, deniability (or non-deniability if desired), and user transparency while avoiding the administrative overhead of certificates for asynchronous communication.

...read moreread less

Proceedings Article•DOI•

Using Application-Level Thread Progress Information to Manage Power and Performance

[...]

Sabrina M. Neuman¹, Jason E. Miller¹, Daniel Sanchez¹, Srinivas Devadas¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Nov 2017

TL;DR: This work presents ThreadBeats, a simple application-level annotation framework that directly and accurately conveys thread progress information to hardware, and designs DVFS controllers that exploit Thread beats information for two purposes: improving performance by equalizing thread progress and minimizing runtime under a power budget constraint.

...read moreread less

Abstract: Power and thermal limitations make it impossible to run all cores on a multicore system at their maximum frequency. Therefore, modern systems require careful power management. These systems must manage complex tradeoffs between energy, power, and frequency, choosing which cores to accelerate to achieve good performance while maintaining energy efficiency or operating under a power budget. Navigating these tradeoffs is especially hard with multi-threaded applications, where performance depends on the relative progress of parallel worker threads between synchronization points. Prior work on chip-level power management for multi-threaded applications has largely relied on indirect heuristics and metrics calculated from low-level performance counters to estimate each thread's progress. However, these indirect metrics are often inaccurate. Instead, we propose to gather progress information directly from software itself. We present ThreadBeats, a simple application-level annotation framework that directly and accurately conveys thread progress information to hardware. We design DVFS controllers that exploit ThreadBeats information for two purposes: (i) improving performance by equalizing thread progress and (ii) minimizing runtime under a power budget constraint. These controllers reduce wait time at barriers by 77% on average and improve energy-delay product under a power budget by 23% over prior work.

...read moreread less

Posted Content•

On Iterative Collision Search for LPN and Subset Sum.

[...]

Srinivas Devadas¹, Ling Ren¹, Hanshen Xiao¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2017-IACR Cryptology ePrint Archive

TL;DR: This paper presents rigorous analysis for the single-list pair-wise iterative collision search method and its applications in subset sum and LPN andresents rigorous analysis of other LPN solving heuristics and shows that they work well when combined with LF2.

...read moreread less