scispace - formally typeset
Search or ask a question

Showing papers by "Srinivas Devadas published in 2016"


Posted Content
TL;DR: In this article, the authors present a detailed and structured presentation of the publicly available information on SGX, a series of intelligent guesses about some important but undocumented aspects of SGX.
Abstract: Intel’s Software Guard Extensions (SGX) is a set of extensions to the Intel architecture that aims to provide integrity and confidentiality guarantees to securitysensitive computation performed on a computer where all the privileged software (kernel, hypervisor, etc) is potentially malicious. This paper analyzes Intel SGX, based on the 3 papers [14, 78, 137] that introduced it, on the Intel Software Developer’s Manual [100] (which supersedes the SGX manuals [94, 98]), on an ISCA 2015 tutorial [102], and on two patents [108, 136]. We use the papers, reference manuals, and tutorial as primary data sources, and only draw on the patents to fill in missing information. This paper’s contributions are a summary of the Intel-specific architectural and micro-architectural details needed to understand SGX, a detailed and structured presentation of the publicly available information on SGX, a series of intelligent guesses about some important but undocumented aspects of SGX, and an analysis of SGX’s security properties.

834 citations


Proceedings Article
01 Jan 2016
TL;DR: Sanctum offers the same promise as Intel’s Software Guard Extensions (SGX), namely strong provable isolation of software modules running concurrently and sharing resources, but protects against an important class of additional software attacks that infer private information from a program's memory access patterns.
Abstract: Sanctum offers the same promise as Intel’s Software Guard Extensions (SGX), namely strong provable isolation of software modules running concurrently and sharing resources, but protects against an important class of additional software attacks that infer private information from a program’s memory access patterns. Sanctum shuns unnecessary complexity, leading to a simpler security analysis. We follow a principled approach to eliminating entire attack surfaces through isolation, rather than plugging attack-specific privacy leaks. Most of Sanctum’s logic is implemented in trusted software, which does not perform cryptographic operations using keys, and is easier to analyze than SGX’s opaque microcode, which does. Our prototype targets a Rocket RISC-V core, an open implementation that allows any researcher to reason about its security properties. Sanctum’s extensions can be adapted to other processor cores, because we do not change any major CPU building block. Instead, we add hardware at the interfaces between generic building blocks, without impacting cycle time. Sanctum demonstrates that strong software isolation is achievable with a surprisingly small set of minimally invasive hardware changes, and a very reasonable overhead.

475 citations


Journal ArticleDOI
TL;DR: This work presents a system-level approach that allows a so-called strong PUF to be used for lightweight authentication in a manner that is heuristically secure against today's best machine learning methods through a worst-case CRP exposure algorithmic validation.
Abstract: We present a lightweight PUF-based authentication approach that is practical in settings where a server authenticates a device, and for use cases where the number of authentications is limited over a device's lifetime. Our scheme uses a server-managed challenge/response pair (CRP) lockdown protocol: unlike prior approaches, an adaptive chosen-challenge adversary with machine learning capabilities cannot obtain new CRPs without the server's implicit permission. The adversary is faced with the problem of deriving a PUF model with a limited amount of machine learning training data. Our system-level approach allows a so-called strong PUF to be used for lightweight authentication in a manner that is heuristically secure against today's best machine learning methods through a worst-case CRP exposure algorithmic validation. We also present a degenerate instantiation using a weak PUF that is secure against computationally unrestricted adversaries, which includes any learning adversary, for practical device lifetimes and read-out rates. We validate our approach using silicon PUF data, and demonstrate the feasibility of supporting 10, 1,000, and 1M authentications, including practical configurations that are not learnable with polynomial resources, e.g., the number of CRPs and the attack runtime, using recent results based on the probably-approximately-correct (PAC) complexity-theoretic framework.

176 citations


Book ChapterDOI
10 Jan 2016
TL;DR: The Onion ORAM is the first concrete instantiation of a constant bandwidth blowup ORAM under standard assumptions, and proposes novel techniques to achieve security against a malicious server, without resorting to expensive and non-standard techniques such as SNARKs.
Abstract: We present Onion ORAM, an Oblivious RAM (ORAM) with constant worst-case bandwidth blowup that leverages poly-logarithmic server computation to circumvent the logarithmic lower bound on ORAM bandwidth blowup. Our construction does not require fully homomorphic encryption, but employs an additively homomorphic encryption scheme such as the Damgard-Jurik cryptosystem, or alternatively a BGV-style somewhat homomorphic encryption scheme without bootstrapping. At the core of our construction is an ORAM scheme that has “shallow circuit depth” over the entire history of ORAM accesses. We also propose novel techniques to achieve security against a malicious server, without resorting to expensive and non-standard techniques such as SNARKs. To the best of our knowledge, Onion ORAM is the first concrete instantiation of a constant bandwidth blowup ORAM under standard assumptions (even for the semi-honest setting).

153 citations


Proceedings ArticleDOI
26 Jun 2016
TL;DR: TicToc is presented, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes and achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.
Abstract: Concurrency control for on-line transaction processing (OLTP) database management systems (DBMSs) is a nasty game. Achieving higher performance on emerging many-core systems is difficult. Previous research has shown that timestamp management is the key scalability bottleneck in concurrency control algorithms. This prevents the system from scaling to large numbers of cores. In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes. TicToc relies on a novel and provably correct data-driven timestamp management protocol. Instead of assigning timestamps to transactions, this protocol assigns read and write timestamps to data items and uses them to lazily compute a valid commit timestamp for each transaction. TicToc removes the need for centralized timestamp allocation, and commits transactions that would be aborted by conventional T/O schemes. We implemented TicToc along with four other concurrency control algorithms in an in-memory, shared-everything OLTP DBMS and compared their performance on different workloads. Our results show that TicToc achieves up to 92% better throughput while reducing the abort rate by 3.3x over these previous algorithms.

140 citations


Journal ArticleDOI
01 Apr 2016
TL;DR: Riffle consists of a small set of anonymity servers and a large number of users, and guarantees anonymity among all honest clients as long as there exists at least one honest server, a bandwidth and computation efficient communication system with strong anonymity.
Abstract: Existing anonymity systems sacrifice anonymity for efficient communication or vice-versa. Onion-routing achieves low latency, high bandwidth, and scalable anonymous communication, but is susceptible to traffic analysis attacks. Designs based on DC-Nets, on the other hand, protect the users against traffic analysis attacks, but sacrifice bandwidth. Verifiable mixnets maintain strong anonymity with low bandwidth overhead, but suffer from high computation overhead instead. In this paper, we present Riffle, a bandwidth and computation efficient communication system with strong anonymity. Riffle consists of a small set of anonymity servers and a large number of users, and guarantees anonymity among all honest clients as long as there exists at least one honest server. Riffle uses a new hybrid verifiable shuffle technique and private information retrieval for bandwidthand computation-efficient anonymous communication. Our evaluation of Riffle in file sharing and microblogging applications shows that Riffle can achieve a bandwidth of over 100KB/s per user in an anonymity set of 200 users in the case of file sharing, and handle over 100,000 users with less than 10 second latency in the case of microblogging.

101 citations


Book ChapterDOI
31 Oct 2016
TL;DR: In this article, the authors show that Balloon hash has tighter space-hardness than previously believed and consistent spacehardness throughout its computation, and construct PoS from stacked expander graphs.
Abstract: Recently, proof of space PoS has been suggested as a more egalitarian alternative to the traditional hash-based proof of work. In PoS, a prover proves to a verifier that it has dedicated some specified amount of space. A closely related notion is memory-hard functions MHF, functions that require a lot of memory/space to compute. While making promising progress, existing PoS and MHF have several problems. First, there are large gaps between the desired space-hardness and what can be proven. Second, it has been pointed out that PoS and MHF should require a lot of space not just at some point, but throughout the entire computation/protocol; few proposals considered this issue. Third, the two existing PoS constructions are both based on a class of graphs called superconcentrators, which are either hard to construct or add a logarithmic factor overhead to efficiency. In this paper, we construct PoS from stacked expander graphs. Our constructions are simpler, more efficient and have tighter provable space-hardness than prior works. Our results also apply to a recent MHF called Balloon hash. We show Balloon hash has tighter space-hardness than previously believed and consistent space-hardness throughout its computation.

44 citations


Posted Content
01 Jan 2016
TL;DR: This paper constructs PoS from stacked expander graphs, which are simpler, more efficient and have tighter provable space-hardness than prior works and applies to a recent MHF called Balloon hash, which has tighter space- Hardness than previously believed.
Abstract: Recently, proof of space (PoS) has been suggested as a more egalitarian alternative to the traditional hash-based proof of work. In PoS, a prover proves to a verifier that it has dedicated some specified amount of space. A closely related notion is memory-hard functions (MHF), functions that require a lot of memory/space to compute. While making promising progress, existing PoS and MHF have several problems. First, there are large gaps between the desired space-hardness and what can be proven. Second, it has been pointed out that PoS and MHF should require a lot of space not just at some point, but throughout the entire computation/protocol; few proposals considered this issue. Third, the two existing PoS constructions are both based on a class of graphs called superconcentrators, which are either hard to construct or add a logarithmic factor overhead to efficiency. In this paper, we construct PoS from stacked expander graphs. Our constructions are simpler, more efficient and have tighter provable space-hardness than prior works. Our results also apply to a recent MHF called Balloon hash. We show Balloon hash has tighter space-hardness than previously believed and consistent space-hardness throughout its computation.

36 citations


Posted Content
TL;DR: Beaver is presented, a decentralized anonymous marketplace that is resistant against Sybil attacks on vendor reputation, while preserving user anonymity.
Abstract: Amid growing concerns of government surveillance and corporate data sharing, web users increasingly demand tools for preserving their privacy without placing trust in a third party. Unfortunately, existing centralized reputation systems need to be trusted for either privacy, correctness, or both. Existing decentralized approaches, on the other hand, are either vulnerable to Sybil attacks, present inconsistent views of the network, or leak critical information about the actions of their users. In this paper, we present Beaver, a decentralized anonymous marketplace that is resistant against Sybil attacks on vendor reputation, while preserving user anonymity. Beaver allows its participants to enjoy open enrollment, and provides every user with the same global view of the reputation of other users through public ledger based consensus. Various cryptographic primitives allow Beaver to offer high levels of usability and practicality, along with strong anonymity guarantees. Operations such as creating a listing, purchasing an item, and leaving feedback take just milliseconds to compute and require generating just a few kilobytes of state while often constructing convenient anonymity sets of hundreds of transactions.

32 citations


Proceedings ArticleDOI
11 Sep 2016
TL;DR: Tardis as mentioned in this paper is a coherence protocol that removes cache invalidation using logical timestamps and achieves excellent scalability, but it does not support the sequential consistency (SC) model, limiting its applicability.
Abstract: Cache coherence scalability is a big challenge in shared memory systems. Traditional protocols do not scale due to the storage and traffic overhead of cache invalidation. Tardis, a recently proposed coherence protocol, removes cache invalidation using logical timestamps and achieves excellent scalability. The original Tardis protocol, however, only supports the Sequential Consistency (SC) memory model, limiting its applicability. Tardis also incurs extra network traffic on some benchmarks due to renew messages, and has suboptimal performance when the program uses spinning to communicate between threads. In this paper, we address these downsides of Tardis protocol and make it significantly more practical. Specifically, we discuss the architectural, memory system and protocol changes required in order to implement the TSO consistency model on Tardis, and prove that the modified protocol satisfies TSO. We also describe modifications for Partial Store Order (PSO) and Release Consistency (RC). Finally, we propose optimizations for better leasing policies and to handle program spinning. On a set of benchmarks, optimized Tardis improves on a full-map directory protocol in the metrics of performance, storage and network traffic, while being simpler to implement.

11 citations


Journal ArticleDOI
TL;DR: A locality-aware selective data replication protocol for the last-level cache (LLC) is proposed to lower memory access latency and energy by only replicating cache lines with high reuse in the LLC slice of the requesting core, while simultaneously keep the off-chip miss rate low.
Abstract: Next generation large single-chip multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of on-chip cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). The goal is to lower memory access latency and energy by only replicating cache lines with high reuse in the LLC slice of the requesting core, while simultaneously keep the off-chip miss rate low. The approach relies on low-overhead yet highly accurate in-hardware runtime cache line level classifier that only allows replication of cache lines with high reuse. Furthermore, a classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. On a set of parallel benchmarks, the proposed protocol reduces overall energy by 14.7, 10.7, 10.5, and 16.7 % and completion time by 2.5, 6.5, 4.5, and 9.5 % when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA, and Static-NUCA LLC management schemes. An efficient classifier implementation is evaluated with an overhead of 5.44 KB, which translates to only 1.58 % on top of the Static-NUCA baseline's cache related per-core storage.

Posted Content
TL;DR: Catena implements a log as an OP_RETURN transaction chain and prevents forks in the log by leveraging Bitcoin’s security against double spends, and can be used to secure many systems today such as public-key directories, Tor directory servers or software transparency schemes.
Abstract: We present Catena, an efficiently-verifiable Bitcoin witnessing scheme. Catena enables any number of thin clients, such as mobile phones, to efficiently agree on a log of applicationspecific statements managed by an adversarial server. Catena implements a log as an OP_RETURN transaction chain and prevents forks in the log by leveraging Bitcoin’s security against double spends. Specifically, if a log server wants to equivocate it has to double spend a Bitcoin transaction output. Thus, Catena logs are as hard to fork as the Bitcoin blockchain: an adversary without a large fraction of the network’s computational power cannot fork Bitcoin and thus cannot fork a Catena log either. However, different from previous Bitcoin-based work, Catena decreases the bandwidth requirements of log auditors from 90 GB to only tens of megabytes. More precisely, our clients only need to download all Bitcoin block headers (currently less than 35 MB) and a small, 600-byte proof for each statement in a block. We implemented Catena in Java using the bitcoinj library and used it to extend CONIKS, a recent key transparency scheme, to witness its public-key directory in the Bitcoin blockchain where it can be efficiently verified by auditors. We show that Catena can be used to secure many systems today such as public-key directories, Tor directory servers or software transparency schemes.

01 Jan 2016
TL;DR: Evaluating Atom on a distributed network of 1,024 dual-core servers is evaluated and it is demonstrated that the system can anonymize more than a million Tweet-length messages with less than 30 minutes of latency.
Abstract: Atom is an anonymity system that protects against traffic-analysis attacks and avoids the scalability bottlenecks of traditional mix-net- and DC-net-based anonymity systems. Atom consists of a distributed network of mix servers connected with a carefully structured link topology. Unlike many anonymous communication system with traffic-analysis protection, each Atom server touches only a small a fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: each user's anonymity set consists of all honest users in the system, against an active adversary who controls the entire network, a constant fraction of the system's servers, and any number of malicious users. We evaluate Atom on a distributed network of 1,024 dual-core servers and demonstrate that the system can anonymize more than a million Tweet-length messages with less than 30 minutes of latency.

Journal ArticleDOI
TL;DR: A holistic locality-aware cache hierarchy management protocol for large-scale multicores that improves on-chip data access latency and energy consumption by intelligently bypassing cache line replication in the L1 caches, and/or intelligently replicating cache lines in the Last-Level Cache.
Abstract: The trend of increasing the number of cores to achieve higher performance has challenged efficient management of on-chip data. Moreover, many emerging applications process massive amounts of data with varying degrees of locality. Therefore, exploiting locality to improve on-chip traffic and resource utilization is of fundamental importance. Conventional multicore cache management schemes either manage the private cache (L1) or the Last-Level Cache (LLC), while ignoring the other. We propose a holistic locality-aware cache hierarchy management protocol for large-scale multicores. The proposed scheme improves on-chip data access latency and energy consumption by intelligently bypassing cache line replication in the L1 caches, and/or intelligently replicating cache lines in the LLC. The approach relies on low overhead yet highly accurate in-hardware runtime classification of data locality at both L1 cache and the LLC. The decision to bypass L1 and/or replicate in LLC is then based on the measured reuse at the fine granularity of cache lines. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional cache coherence protocols. Moreover, the complexity of the protocol is low since no additional coherence states are created. However, the proposed classifier incurs a 5.6KB per-core storage overhead. On a set of parallel benchmarks, the locality-aware protocol reduces average energy consumption by 26% and completion time by 16%, when compared to the state-of-the-art Reactive-NUCA multicore cache management scheme.

Posted Content
01 Jan 2016
TL;DR: In this paper, the authors presented a fuzzy extractor whose security can be reduced to the hardness of Learning Parity with Noise (LPN) and can efficiently correct a constant fraction of errors in a biometric source with a noise-avoiding trapdoor.
Abstract: We present a fuzzy extractor whose security can be reduced to the hardness of Learning Parity with Noise (LPN) and can efficiently correct a constant fraction of errors in a biometric source with a “noise-avoiding trapdoor.” Using this computational fuzzy extractor, we present a stateless construction of a cryptographically-secure Physical Unclonable Function. Our construct requires no non-volatile (permanent) storage, secure or otherwise, and its computational security can be reduced to the hardness of an LPN variant under the random oracle model. The construction is “stateless,” because there is no information stored between subsequent queries, which mitigates attacks against the PUF via tampering. Moreover, our stateless construction corresponds to a PUF whose outputs are free of noise because of internal error-correcting capability, which enables a host of applications beyond authentication. We describe the construction, provide a proof of computational security, analysis of the security parameter for system parameter choices, and present experimental evidence that the construction is practical and reliable under a wide environmental range.

Posted Content
TL;DR: It is shown that, on a heterogeneous network of 1,024 servers, Atom can transit a million Tweet-length messages in 28 minutes, over 23x faster than prior systems with similar privacy guarantees.
Abstract: Atom is an anonymous messaging system that protects against traffic-analysis attacks. Unlike many prior systems, each Atom server touches only a small fraction of the total messages routed through the network. As a result, the system's capacity scales near-linearly with the number of servers. At the same time, each Atom user benefits from "best possible" anonymity: a user is anonymous among all honest users of the system, against an active adversary who controls the entire network, a portion of the system's servers, and any number of malicious users. The architectural ideas behind Atom have been known in theory, but putting them into practice requires new techniques for (1) avoiding the reliance on heavy general-purpose multi-party computation protocols, (2) defeating active attacks by malicious servers at minimal performance cost, and (3) handling server failure and churn. Atom is most suitable for sending a large number of short messages, as in a microblogging application or a high-security communication bootstrapping ("dialing") for private messaging systems. We show that, on a heterogeneous network of 1,024 servers, Atom can transit a million Tweet-length messages in 28 minutes. This is over 23x faster than prior systems with similar privacy guarantees.

Posted Content
TL;DR: Catena as discussed by the authors is an efficient verifiable Bitcoin witnessing scheme that enables any number of thin clients, such as mobile phones, to agree on a log of application-specific statements managed by an adversarial server.
Abstract: We present Catena, an efficiently-verifiable Bitcoinwitnessing scheme. Catena enables any number of thin clients, such as mobile phones, to efficiently agree on a log of application-specific statements managed by an adversarial server. Catenaimplements a log as an OP_RETURN transaction chain andprevents forks in the log by leveraging Bitcoin's security againstdouble spends. Specifically, if a log server wants to equivocate ithas to double spend a Bitcoin transaction output. Thus, Catenalogs are as hard to fork as the Bitcoin blockchain: an adversarywithout a large fraction of the network's computational powercannot fork Bitcoin and thus cannot fork a Catena log either. However, different from previous Bitcoin-based work, Catenadecreases the bandwidth requirements of log auditors from 90GB to only tens of megabytes. More precisely, our clients onlyneed to download all Bitcoin block headers (currently less than35 MB) and a small, 600-byte proof for each statement in a block. We implement Catena in Java using the bitcoinj library and use itto extend CONIKS, a recent key transparency scheme, to witnessits public-key directory in the Bitcoin blockchain where it can beefficiently verified by auditors. We show that Catena can securemany systems today, such as public-key directories, Tor directoryservers and software transparency schemes.


Journal ArticleDOI
TL;DR: To achieve pervasive authentication, it is useful for the authentication modality to be easy to integrate with modern electronic devices and to beeasy for non-experts to use.
Abstract: Authentication of physical items is an age-old problem. Common approaches include the use of bar codes, QR codes, holograms, and RFID (radio-frequency identification) tags. Traditional RFID tags and bar codes use a public identifier as a means of authenticating. A public identifier, however, is static: it is the same each time when queried and can be easily copied by an adversary. Holograms can also be viewed as public identifiers: a knowledgeable verifier knows all the attributes to inspect visually. It is difficult to make hologram-based authentication pervasive; a casual verifier does not know all the attributes to look for. Further, to achieve pervasive authentication, it is useful for the authentication modality to be easy to integrate with modern electronic devices (e.g., mobile smartphones) and to be easy for non-experts to use.