scispace - formally typeset
Search or ask a question
Author

Srinivas Devadas

Bio: Srinivas Devadas is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Sequential logic & Combinational logic. The author has an hindex of 88, co-authored 480 publications receiving 31897 citations. Previous affiliations of Srinivas Devadas include University of California, Berkeley & Cornell University.


Papers
More filters
Posted Content
TL;DR: This work improves the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption and presents a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case.
Abstract: We present new protocols for Byzantine state machine replication and Byzantine agreement in the synchronous and authenticated setting. The celebrated PBFT state machine replication protocol tolerates $f$ Byzantine faults in an asynchronous setting using $3f+1$ replicas, and has since been studied or deployed by numerous works. In this work, we improve the Byzantine fault tolerance threshold to $n=2f+1$ by utilizing a relaxed synchrony assumption. We present a synchronous state machine replication protocol that commits a decision every 3 rounds in the common case. The key challenge is to ensure quorum intersection at one honest replica. Our solution is to rely on the synchrony assumption to form a post-commit quorum of size $2f+1$, which intersects at $f+1$ replicas with any pre-commit quorums of size $f+1$. Our protocol also solves synchronous authenticated Byzantine agreement in expected 8 rounds. The best previous solution (Katz and Koo, 2006) requires expected 24 rounds. Our protocols may be applied to build Byzantine fault tolerant systems or improve cryptographic protocols such as cryptocurrencies when synchrony can be assumed.

36 citations

Proceedings ArticleDOI
19 Jun 2014
TL;DR: This work proposes a locality-aware selective data replication protocol for the last-level cache (LLC) that aims to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low.
Abstract: Next generation multicores will process massive data with varying degree of locality. Harnessing on-chip data locality to optimize the utilization of cache and network resources is of fundamental importance. We propose a locality-aware selective data replication protocol for the last-level cache (LLC). Our goal is to lower memory access latency and energy by replicating only high locality cache lines in the LLC slice of the requesting core, while simultaneously keeping the off-chip miss rate low. Our approach relies on low overhead yet highly accurate in-hardware run-time classification of data locality at the cache line granularity, and only allows replication for cache lines with high reuse. Furthermore, our classifier captures the LLC pressure at the existing replica locations and adapts its replication decision accordingly. The locality tracking mechanism is decoupled from the sharer tracking structures that cause scalability concerns in traditional coherence protocols. Moreover, the complexity of our protocol is low since no additional coherence states are created. On a set of parallel benchmarks, our protocol reduces the overall energy by 16%, 14%, 13% and 21% and the completion time by 4%, 9%, 6% and 13% when compared to the previously proposed Victim Replication, Adaptive Selective Replication, Reactive-NUCA and Static-NUCA LLC management schemes.

36 citations

Book ChapterDOI
12 Nov 2017
TL;DR: In this paper, the authors proposed the notion of bandwidth hard functions to reduce an ASIC's energy advantage, which can be used to reduce the energy consumption of the ASIC. But, the authors also pointed out that the memory hardness approach is an incomplete solution, since it only attempts to provide resistance to the area advantage but overlooks the more important energy advantage.
Abstract: Cryptographic hash functions have wide applications including password hashing, pricing functions for spam and denial-of-service countermeasures and proof of work in cryptocurrencies. Recent progress on ASIC (Application Specific Integrated Circuit) hash engines raise concerns about the security of the above applications. This leads to a growing interest in ASIC resistant hash function and ASIC resistant proof of work schemes, i.e., those that do not give ASICs a huge advantage. The standard approach towards ASIC resistance today is through memory hard functions or memory hard proof of work schemes. However, we observe that the memory hardness approach is an incomplete solution. It only attempts to provide resistance to an ASIC’s area advantage but overlooks the more important energy advantage. In this paper, we propose the notion of bandwidth hard functions to reduce an ASIC’s energy advantage. CPUs cannot compete with ASICs for energy efficiency in computation, but we can rely on memory accesses to reduce an ASIC’s energy advantage because energy costs of memory accesses are comparable for ASICs and CPUs. We propose a model for hardware energy cost that has sound foundations in practice. We then analyze the bandwidth hardness property of ASIC resistant candidates. We find scrypt, Catena-BRG and Balloon are bandwidth hard with suitable parameters. Lastly, we observe that a capacity hard function is not necessarily bandwidth hard, with a stacked double butterfly graph being a counterexample.

36 citations

Proceedings ArticleDOI
16 May 1988
TL;DR: Algorithms for Boolean decomposition are presented, which decompose a two-level logic function into a cascade of smaller two- level logic functions, such that the overall area of the resulting logic network is minimized.
Abstract: The authors present algorithms for Boolean decomposition, which decompose a two-level logic function into a cascade of smaller two-level logic functions, such that the overall area of the resulting logic network is minimized. The algorithms are based on multiple-valued minimization. Given a PLA (programmable logic array), a subset of inputs to the PLA is selected. This selection step incorporates a novel algorithm which selects a set of inputs such that the cardinality of the multiple-valued cover, produced by representing all combinations of these inputs as different values of a single multiple-valued variable, is much smaller than the original binary cover cardinality. A relatively small size for the multiple-valued cover implies that the number of good Boolean factors contained in this subset of inputs are re-encoded to satisfy the constraints given in the multiple-valued cover, thus producing a binary cover for the original PLA whose cardinality equals the multiple-valued cover cardinality. >

35 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines, and relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line.
Abstract: Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.

35 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: TaintDroid as mentioned in this paper is an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data by leveraging Android's virtualized execution environment.
Abstract: Today’s smartphone operating systems frequently fail to provide users with visibility into how third-party applications collect and share their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid enables realtime analysis by leveraging Android’s virtualized execution environment. TaintDroid incurs only 32p performance overhead on a CPU-bound microbenchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, in our 2010 study we found 20 applications potentially misused users’ private information; so did a similar fraction of the tested applications in our 2012 study. Monitoring the flow of privacy-sensitive data with TaintDroid provides valuable input for smartphone users and security service firms seeking to identify misbehaving applications.

2,983 citations

Proceedings ArticleDOI
04 Oct 2010
TL;DR: Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, this work found 68 instances of misappropriation of users' location and device identification information across 20 applications.
Abstract: Today's smartphone operating systems frequently fail to provide users with adequate control over and visibility into how third-party applications use their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid provides realtime analysis by leveraging Android's virtualized execution environment. TaintDroid incurs only 14% performance overhead on a CPU-bound micro-benchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, we found 68 instances of potential misuse of users' private information across 20 applications. Monitoring sensitive data with TaintDroid provides informed use of third-party applications for phone users and valuable input for smartphone security service firms seeking to identify misbehaving applications.

2,379 citations

Journal ArticleDOI
TL;DR: The OBDD data structure is described and a number of applications that have been solved by OBDd-based symbolic analysis are surveyed.
Abstract: Ordered Binary-Decision Diagrams (OBDDs) represent Boolean functions as directed acyclic graphs. They form a canonical representation, making testing of functional properties such as satisfiability and equivalence straightforward. A number of operations on Boolean functions can be implemented as graph algorithms on OBDD data structures. Using OBDDs, a wide variety of problems can be solved through symbolic analysis. First, the possible variations in system parameters and operating conditions are encoded with Boolean variables. Then the system is evaluated for all variations by a sequence of OBDD operations. Researchers have thus solved a number of problems in digital-system design, finite-state system analysis, artificial intelligence, and mathematical logic. This paper describes the OBDD data structure and surveys a number of applications that have been solved by OBDD-based symbolic analysis.

2,196 citations

Proceedings ArticleDOI
04 Jun 2007
TL;DR: This work presents PUF designs that exploit inherent delay characteristics of wires and transistors that differ from chip to chip, and describes how PUFs can enable low-cost authentication of individual ICs and generate volatile secret keys for cryptographic operations.
Abstract: Physical Unclonable Functions (PUFs) are innovative circuit primitives that extract secrets from physical characteristics of integrated circuits (ICs). We present PUF designs that exploit inherent delay characteristics of wires and transistors that differ from chip to chip, and describe how PUFs can enable low-cost authentication of individual ICs and generate volatile secret keys for cryptographic operations.

2,014 citations

Proceedings Article
01 Jan 2007

1,944 citations