scispace - formally typeset
Search or ask a question

Showing papers by "Srinivas Devadas published in 2017"


Proceedings ArticleDOI
22 May 2017
TL;DR: Catena enables any number of thin clients, such as mobile phones, to efficiently agree on a log of application-specific statements managed by an adversarial server, and increases the bandwidth requirements of log auditors from 90GB to only tens of megabytes.
Abstract: We present Catena, an efficiently-verifiable Bitcoinwitnessing scheme. Catena enables any number of thin clients, such as mobile phones, to efficiently agree on a log of application-specific statements managed by an adversarial server. Catenaimplements a log as an OP_RETURN transaction chain andprevents forks in the log by leveraging Bitcoin's security againstdouble spends. Specifically, if a log server wants to equivocate ithas to double spend a Bitcoin transaction output. Thus, Catenalogs are as hard to fork as the Bitcoin blockchain: an adversarywithout a large fraction of the network's computational powercannot fork Bitcoin and thus cannot fork a Catena log either. However, different from previous Bitcoin-based work, Catenadecreases the bandwidth requirements of log auditors from 90GB to only tens of megabytes. More precisely, our clients onlyneed to download all Bitcoin block headers (currently less than35 MB) and a small, 600-byte proof for each statement in a block. We implement Catena in Java using the bitcoinj library and use itto extend CONIKS, a recent key transparency scheme, to witnessits public-key directory in the Bitcoin blockchain where it can beefficiently verified by auditors. We show that Catena can securemany systems today, such as public-key directories, Tor directoryservers and software transparency schemes.

130 citations


Proceedings ArticleDOI
30 Oct 2017
TL;DR: This work introduces a verification methodology based on a trusted abstract platform (TAP), a formalization of idealized enclave platforms along with a parameterized adversary, and formalizes the notion of secure remote execution.
Abstract: Recent proposals for trusted hardware platforms, such as Intel SGX and the MIT Sanctum processor, offer compelling security features but lack formal guarantees. We introduce a verification methodology based on a trusted abstract platform (TAP), a formalization of idealized enclave platforms along with a parameterized adversary. We also formalize the notion of secure remote execution and present machine-checked proofs showing that the TAP satisfies the three key security properties that entail secure remote execution: integrity, confidentiality and secure measurement. We then present machine-checked proofs showing that SGX and Sanctum are refinements of the TAP under certain parameterizations of the adversary, demonstrating that these systems implement secure enclaves for the stated adversary models.

113 citations


Journal ArticleDOI
TL;DR: A fuzzy extractor whose security can be reduced to the hardness of Learning Parity with Noise (LPN) and can efficiently correct a constant fraction of errors in a biometric source with a “noise-avoiding trapdoor” is presented.
Abstract: We present a fuzzy extractor whose security can be reduced to the hardness of Learning Parity with Noise (LPN) and can efficiently correct a constant fraction of errors in a biometric source with a “noise-avoiding trapdoor.” Using this computational fuzzy extractor, we present a stateless construction of a cryptographically-secure Physical Unclonable Function. Our construct requires no non-volatile (permanent) storage, secure or otherwise, and its computational security can be reduced to the hardness of an LPN variant under the random oracle model. The construction is “stateless,” because there is no information stored between subsequent queries, which mitigates attacks against the PUF via tampering. Moreover, our stateless construction corresponds to a PUF whose outputs are free of noise because of internal error-correcting capability, which enables a host of applications beyond authentication. We describe the construction, provide a proof of computational security, analysis of the security parameter for system parameter choices, and present experimental evidence that the construction is practical and reliable under a wide environmental range.

85 citations


Proceedings ArticleDOI
14 Oct 2017
TL;DR: Atom as mentioned in this paper is an anonymous messaging system that protects against traffic-analysis attacks, where each server touches only a small fraction of the total messages routed through the network, and the system's capacity scales nearlinearly with the number of servers.
Abstract: Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all honest users of the system, even against an active adversary who monitors the entire network, a portion of the system's servers, and any number of malicious users. The architectural ideas behind Atom have been known in theory, but putting them into practice requires new techniques for (1) avoiding heavy general-purpose multi-party computation protocols, (2) defeating active attacks by malicious servers at minimal performance cost, and (3) handling server failure and churn. Atom is most suitable for sending a large number of short messages, as in a microblogging application or a high-security communication bootstrapping ("dialing") for private messaging systems. We show that, on a heterogeneous network of 1,024 servers, Atom can transit a million Tweet-length messages in 28 minutes. This is over 23x faster than prior systems with similar privacy guarantees.

84 citations


Proceedings ArticleDOI
14 Oct 2017
TL;DR: Banshee is a new DRAM cache design that optimizes for both in-package and off-package DRAM bandwidth efficiency without degrading access latency and reduces unnecessary DRAM caches replacement traffic with a new bandwidth-aware frequency-based replacement policy.
Abstract: Placing the DRAM in the same package as a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs optimize mainly for cache hit latency and do not consider bandwidth efficiency as a first-class design constraint. Hence, as we show in this paper, these designs are suboptimal for use with in-package DRAM.We propose a new DRAM cache design, Banshee, that optimizes for both in-package and off-package DRAM bandwidth efficiency without degrading access latency. Banshee is based on two key ideas. First, it eliminates the tag lookup overhead by tracking the contents of the DRAM cache using TLBs and page table entries, which is efficiently enabled by a new lightweight TLB coherence protocol we introduce. Second, it reduces unnecessary DRAM cache replacement traffic with a new bandwidth-aware frequency-based replacement policy. Our evaluations show that Banshee significantly improves performance (15% on average) and reduces DRAM traffic (35.8% on average) over the best-previous latency-optimized DRAM cache design.CCS CONCEPTS•Computersystemsorganization → Multicore architectures; {\it Heterogeneous (hybrid) systems;

69 citations


Posted Content
TL;DR: This work improves the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption and presents a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case.
Abstract: We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption. We present a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case. The key challenge is to ensure quorum intersection at one honest replica. Our solution is to rely on the synchrony assumption to form a post-commit quorum of size $2f+1$, which intersects at $f+1$ replicas with any pre-commit quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in expected 8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24 rounds. Our protocols may be applied to build Byzantine fault tolerant systems or improve cryptographic protocols such as cryptocurrencies when synchrony can be assumed.

36 citations


Book ChapterDOI
12 Nov 2017
TL;DR: In this paper, the authors proposed the notion of bandwidth hard functions to reduce an ASIC's energy advantage, which can be used to reduce the energy consumption of the ASIC. But, the authors also pointed out that the memory hardness approach is an incomplete solution, since it only attempts to provide resistance to the area advantage but overlooks the more important energy advantage.
Abstract: Cryptographic hash functions have wide applications including password hashing, pricing functions for spam and denial-of-service countermeasures and proof of work in cryptocurrencies. Recent progress on ASIC (Application Specific Integrated Circuit) hash engines raise concerns about the security of the above applications. This leads to a growing interest in ASIC resistant hash function and ASIC resistant proof of work schemes, i.e., those that do not give ASICs a huge advantage. The standard approach towards ASIC resistance today is through memory hard functions or memory hard proof of work schemes. However, we observe that the memory hardness approach is an incomplete solution. It only attempts to provide resistance to an ASIC’s area advantage but overlooks the more important energy advantage. In this paper, we propose the notion of bandwidth hard functions to reduce an ASIC’s energy advantage. CPUs cannot compete with ASICs for energy efficiency in computation, but we can rely on memory accesses to reduce an ASIC’s energy advantage because energy costs of memory accesses are comparable for ASICs and CPUs. We propose a model for hardware energy cost that has sound foundations in practice. We then analyze the bandwidth hardness property of ASIC resistant candidates. We find scrypt, Catena-BRG and Balloon are bandwidth hard with suitable parameters. Lastly, we observe that a capacity hard function is not necessarily bandwidth hard, with a stacked double butterfly graph being a counterexample.

36 citations


Journal ArticleDOI
09 Dec 2017
TL;DR: The main insight is that “confidence information” does not need to be kept private, if the noise vector is independent of the confidence information, e.g., the bits generated by ring oscillator pairs which are physically placed close to each other.
Abstract: Herder et al. (IEEE Transactions on Dependable and Secure Computing, 2017) designed a new computational fuzzy extractor and physical unclonable function (PUF) challenge-response protocol based on the Learning Parity with Noise (LPN) problem. The protocol requires no irreversible state updates on the PUFs for security, like burning irreversible fuses, and can correct for significant measurement noise when compared to PUFs using a conventional (information theoretical secure) fuzzy extractor. However, Herder et al. did not implement their protocol. In this paper, we give the first implementation of a challenge response protocol based on computational fuzzy extractors. Our main insight is that “confidence information” does not need to be kept private, if the noise vector is independent of the confidence information, e.g., the bits generated by ring oscillator pairs which are physically placed close to each other. This leads to a construction which is a simplified version of the design of Herder et al. (also building on a ring oscillator PUF). Our simplifications allow for a dramatic reduction in area by making a mild security assumption on ring oscillator physical obfuscated key output bits.

29 citations


Book
13 Jul 2017
TL;DR: Secure Processors Part I: Background, Taxonomy for Secure Enclaves and IntelSGX Architecture and Intel SGX Architecture
Abstract: Secure Processors Part I: Background, Taxonomy for Secure Enclaves and Intel SGX Architecture

27 citations


Book
13 Jul 2017
TL;DR: The MIT Sanctum processor developed by the authors is introduced: a system designed to offer stronger security guarantees, lend itself better to analysis and formal verification, and offer a more straightforward and complete threat model than the Intel system, all with an equivalent programming model.
Abstract: This manuscript is the second in a two part survey and analysis of the state of the art in secure processor systems, with a specific focus on remote software attestation and software isolation. The first part established the taxonomy and prerequisite concepts relevant to an examination of the state of the art in trusted remote computation: attested software isolation containers (enclaves). This second part extends Part I’s description of Intel’s Software Guard Extensions (SGX), an available and documented enclave-capable system, with a rigorous security analysis of SGX as a system for trusted remote computation. This part documents the authors’ concerns over the shortcomings of SGX as a secure system and introduces the MIT Sanctum processor developed by the authors: a system designed to offer stronger security guarantees, lend itself better to analysis and formal verification, and offer a more straightforward and complete threat model than the Intel system, all with an equivalent programming model. This two part work advocates a principled, transparent, and wellscrutinized approach to system design, and argues that practical guarantees of privacy and integrity for remote computation are achievable at a reasonable design cost and performance overhead.

21 citations


Posted Content
TL;DR: In this paper, the authors improved the Byzantine fault tolerance to 2f+1$ by utilizing the synchrony assumption and proposed a synchronous Byzantine state machine replication and Byzantine agreement in synchronous and authenticated setting.
Abstract: We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance to $n=2f+1$ by utilizing the synchrony assumption. The key challenge is to ensure a quorum intersection at one \emph{honest} replica. Our solution is to rely on the synchrony assumption to form a \emph{post-commit} quorum of size $2f+1$, which intersects at $f+1$ replicas with any \emph{pre-commit} quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in fewer rounds than the best existing solution (Katz and Koo, 2006). A challenge in this direction is to handle non-simultaneous termination, which we solve by introducing a notion of \emph{virtual} participation after termination. Our protocols may be applied to build practical synchronous Byzantine fault tolerant systems and improve cryptographic protocols such as secure multiparty computation and cryptocurrencies when synchrony can be assumed.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper improves the Byzantine fault tolerance to n = 2f + 1 by utilizing the synchrony assumption and solves synchronous authenticated Byzantine agreement in fewer expected rounds than the best existing solution.
Abstract: This paper presents new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The PBFT state machine replication protocol tolerates f Byzantine faults in an asynchronous setting using n = 3f + 1 replicas. We improve the Byzantine fault tolerance to n = 2f + 1 by utilizing the synchrony assumption. Our protocol also solves synchronous authenticated Byzantine agreement in fewer expected rounds than the best existing solution (Katz and Koo, 2006).


Book ChapterDOI
12 Nov 2017
TL;DR: In this paper, the authors present rigorous analysis for the single-list pair-wise iterative collision search method and its applications in subset sum and learning parity with noise (LPN) problems.
Abstract: Iterative collision search procedures play a key role in developing combinatorial algorithms for the subset sum and learning parity with noise (LPN) problems. In both scenarios, the single-list pair-wise iterative collision search finds the most solutions and offers the best efficiency. However, due to its complex probabilistic structure, no rigorous analysis for it appears to be available to the best of our knowledge. As a result, theoretical works often resort to overly constrained and sub-optimal iterative collision search variants in exchange for analytic simplicity. In this paper, we present rigorous analysis for the single-list pair-wise iterative collision search method and its applications in subset sum and LPN. In the LPN literature, the method is known as the LF2 heuristic. Besides LF2, we also present rigorous analysis of other LPN solving heuristics and show that they work well when combined with LF2. Putting it together, we significantly narrow the gap between theoretical and heuristic algorithms for LPN.

Proceedings ArticleDOI
07 Jun 2017
TL;DR: This framework introduces productive software design guidelines which enable a guarded environment to execute sensitive policy checking code - hence enforcing application control flow integrity - and afford flexibility to the application designer to construct appropriate high-level policies to customize policy checker software.
Abstract: Critical resource sharing among multiple entities in a processing system is inevitable, which in turn calls for the presence of appropriate authentication and access control mechanisms. Generally speaking, these mechanisms are implemented via trusted software "policy checkers" that enforce certain high level application-specific "rules" to enforce a policy. Whether implemented as operating system modules or embedded inside the application ad hoc, these policy checkers expose additional attack surface in addition to the application logic. In order to protect application software from an adversary, modern secure processing platforms, such as Intel's Software Guard Extensions (SGX), employ principled hardware isolation to offer secure software containers or enclaves to execute trusted sensitive code with some integrity and privacy guarantees against a privileged software adversary. We extend this model further and propose using these hardware isolation mechanisms to shield the authentication and access control logic essential to policy checker software. While relying on the fundamental features of modern secure processors, our framework introduces productive software design guidelines which enable a guarded environment to execute sensitive policy checking code - hence enforcing application control flow integrity - and afford flexibility to the application designer to construct appropriate high-level policies to customize policy checker software.

Journal Article
TL;DR: This work directly construct public-key encryption and digital signature algorithms with noisy keys based on a weaker model of grading encoding, and uses the computational fuzzy vault to construct the first reusable fuzzy extractor supporting a linear fraction of errors.
Abstract: Passwords bootstrap symmetric and asymmetric cryptography, tying keys to an individual user. Biometrics are intended to strengthen this tie. Unfortunately, biometrics exhibit noise between repeated readings. Fuzzy extractors (Dodis et al., Eurocrypt 2004) derive stable symmetric keys from noisy sources. We ask if it is also possible for noisy sources to directly replace private keys in asymmetric cryptosystems. We propose a new primitive called public-key cryptosystems with noisy keys. Such a cryptosystem functions when the private key varies according to some metric. An intuitive solution is to combine a fuzzy extractor with a public key cryptosystem. Unfortunately, fuzzy extractors need static helper information to account for noise. This helper information creates fundamental limitations on the resulting cryptosytems. To overcome these limitations, we directly construct public-key encryption and digital signature algorithms with noisy keys. The core of our constructions is a computational version of the fuzzy vault (Juels and Sudan, Designs, Codes, and Cryptography 2006). Security of our schemes is based on graded encoding schemes (Garg et al., Eurocrypt 2013, Garg et al., TCC 2016). Importantly, our public-key encryption algorithm is based on a weaker model of grading encoding. If functional encryption or indistinguishable obfuscation exist in this weaker model, they also exist in the standard model. In addition, we use the computational fuzzy vault to construct the first reusable fuzzy extractor (Boyen, CCS 2004) supporting a linear fraction of errors.

Posted Content
TL;DR: Banshee as mentioned in this paper proposes a new DRAM cache design, Banshee, that optimizes for both in-and off-package DRAM bandwidth efficiency without degrading access latency, which is a first-class design constraint.
Abstract: Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs mainly optimize for hit latency and do not consider off-chip bandwidth efficiency as a first-class design constraint. Hence, as we show in this paper, these designs are suboptimal for use with in-package DRAM. We propose a new DRAM cache design, Banshee, that optimizes for both in- and off-package DRAM bandwidth efficiency without degrading access latency. The key ideas are to eliminate the in-package DRAM bandwidth overheads due to costly tag accesses through virtual memory mechanism and to incorporate a bandwidth-aware frequency-based replacement policy that is biased to reduce unnecessary traffic to off-package DRAM. Our extensive evaluation shows that Banshee provides significant performance improvement and traffic reduction over state-of-the-art latency-optimized DRAM cache designs.

Journal ArticleDOI
TL;DR: PriviPK uniquely combines important privacy properties such as forward secrecy, deniability (or non-deniability if desired), and user transparency while avoiding the administrative overhead of certificates for asynchronous communication.

Proceedings ArticleDOI
01 Nov 2017
TL;DR: This work presents ThreadBeats, a simple application-level annotation framework that directly and accurately conveys thread progress information to hardware, and designs DVFS controllers that exploit Thread beats information for two purposes: improving performance by equalizing thread progress and minimizing runtime under a power budget constraint.
Abstract: Power and thermal limitations make it impossible to run all cores on a multicore system at their maximum frequency. Therefore, modern systems require careful power management. These systems must manage complex tradeoffs between energy, power, and frequency, choosing which cores to accelerate to achieve good performance while maintaining energy efficiency or operating under a power budget. Navigating these tradeoffs is especially hard with multi-threaded applications, where performance depends on the relative progress of parallel worker threads between synchronization points. Prior work on chip-level power management for multi-threaded applications has largely relied on indirect heuristics and metrics calculated from low-level performance counters to estimate each thread's progress. However, these indirect metrics are often inaccurate. Instead, we propose to gather progress information directly from software itself. We present ThreadBeats, a simple application-level annotation framework that directly and accurately conveys thread progress information to hardware. We design DVFS controllers that exploit ThreadBeats information for two purposes: (i) improving performance by equalizing thread progress and (ii) minimizing runtime under a power budget constraint. These controllers reduce wait time at barriers by 77% on average and improve energy-delay product under a power budget by 23% over prior work.

Posted Content
TL;DR: This paper presents rigorous analysis for the single-list pair-wise iterative collision search method and its applications in subset sum and LPN andresents rigorous analysis of other LPN solving heuristics and shows that they work well when combined with LF2.
Abstract: Iterative collision search procedures play a key role in developing combinatorial algorithms for the subset sum and learning parity with noise (LPN) problems. In both scenarios, the single-list pair-wise iterative collision search finds the most solutions and offers the best efficiency. However, due to its complex probabilistic structure, no rigorous analysis for it appears to be available to the best of our knowledge. As a result, theoretical works often resort to overly constrained and sub-optimal iterative collision search variants in exchange for analytic simplicity. In this paper, we present rigorous analysis for the single-list pair-wise iterative collision search method and its applications in subset sum and LPN. In the LPN literature, the method is known as the LF2 heuristic. Besides LF2, we also present rigorous analysis of other LPN solving heuristics and show that they work well when combined with LF2. Putting it together, we significantly narrow the gap between theoretical and heuristic algorithms for LPN.