Author
Srinivas Devadas
Other affiliations: University of California, Berkeley, Cornell University, Bar-Ilan University ...read more
Bio: Srinivas Devadas is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Sequential logic & Combinational logic. The author has an hindex of 88, co-authored 480 publications receiving 31897 citations. Previous affiliations of Srinivas Devadas include University of California, Berkeley & Cornell University.
Papers published on a yearly basis
Papers
More filters
•
01 Jan 1996
TL;DR: This thesis presents techniques for code generation and optimization that target embedded digital signal processors that have proven to be effective in improving the performance and reducing the size of compiled software.
Abstract: The advent of deep submicron processing technology has made it possible and desirable to integrate a processor core, a program ROM, and application-specific circuitry all on a single IC. As the complexity of embedded software grows, highlevel languages such as C and C++ are increasingly employed in writing embedded software. Consequently, high-level language compilers have become an essential tool in the development of embedded systems. Fixed-point digital signal processors are among the most commonly embedded cores, due to their favorable performance–cost characteristics. However, these architectures are usually designed and optimized for their application domain, and pose challenges for compiler technology. Traditional compiler optimizations, though necessary, are insufficient for generating efficient and compact code. Therefore, new optimizations are required to produce code of the highest quality in a reasonable amount of time. In this thesis the author presents techniques for code generation and optimization that target embedded digital signal processors. These techniques have proven to be effective in improving the performance and reducing the size of compiled software. This thesis emphasizes optimization techniques; only by gaining a deeper understanding of the problems involved can we then apply them to a wider class of architectures. Keywords—compiler optimizations, digital signal processors, embedded systems. Thesis Supervisor: Srinivas Devadas Title: Associate Professor of Electrical Engineering and Computer Science
93 citations
••
01 May 1998TL;DR: This work presents a new HDL-satisfiability checking algorithm that works directly on the HDL model, and the primary feature of this algorithm is a seamless integration of linear-programming techniques for feasibility checking of arithmetic equations that govern the behavior of datapath modules, and 3-SAT checking for logic equations that governs the behaviorof control modules.
Abstract: Our strategy for automatic generation of functional vectors is based on exercising selected paths in the given hardware description language (HDL) model. The HDL model describes interconnections of arithmetic, logic and memory modules. Given a path in the HDL model, the search for input stimuli that exercise the path can be converted into a standard satisfiability checking problem by expanding the arithmetic modules into logic-gates. However, this approach is not very efficient. We present a new HDL-satisfiability checking algorithm that works directly on the HDL model. The primary feature of our algorithm is a seamless integration of linear-programming techniques for feasibility checking of arithmetic equations that govern the behavior of datapath modules, and 3-SAT checking for logic equations that govern the behavior of control modules. This feature is critically important to efficiency, since it avoids module expansion and allows us to work with logic and arithmetic equations whose cardinality tracks the size of the HDL model. We describe the details of the HDL-satisfiability checking algorithm in this paper. Experimental results which show significant speedups over state-of-the-art gate-level satisfiability checking methods are included.
92 citations
••
13 Jun 1997TL;DR: The optimization techniques in a compiler can be used not only to generate efficient or compact code, but also to help the designer of a custom DSP architecture make decisions on addressarithmetic featuers.
Abstract: Many application-specific architectures provideindirect addressing modes with auto-increment/decrementarithmetic.Since these architectures generally do not featurean indexed addressing mode, stack-allocated variablesmust be accessed by allocating address registers and performingaddress arithmetic.Subsuming address arithmeticinto auto-increment/decrement arithmetic improves boththe performance and size of the generated code.Our objective in this paper is to provide a method forcomprehensively analyzing the performance benefits andhardware cost due to an auto-increment/decrement featurethat varies from -l to +l, and allowing access to k addressregisters in an address generator.We provide this methodvia a parameterizable optimization algorithm that operateson a procedure-wise basis.Hence, the optimizationtechniques in a compiler can be used not only to generateefficient or compact code, but also to help the designerof a custom DSP architecture make decisions on addressarithmetic featuers.We present two sets of experimental results based onselected benchmark programs: (1) the values of l and kbeyond which there is little or no improvement in performance,and (2) the values of l and k which result in minimumcode area.
91 citations
••
14 Mar 2015TL;DR: This work is the first to prototype Recursive ORAM or ORAM with any integrity scheme in hardware and report area and clock frequency for a complete ORAM design post-synthesis and post-layout using an ASIC flow in a 32~nm commercial process.
Abstract: Oblivious RAM (ORAM) is a cryptographic primitive that hides memory access patterns as seen by untrusted storage. Recently, ORAM has been architected into secure processors. A big challenge for hardware ORAM schemes is how to efficiently manage the Position Map (PosMap), a central component in modern ORAM algorithms. Implemented naively, the PosMap causes ORAM to be fundamentally unscalable in terms of on-chip area. On the other hand, a technique called Recursive ORAM fixes the area problem yet significantly increases ORAM's performance overhead. To address this challenge, we propose three new mechanisms. We propose a new ORAM structure called the PosMap Lookaside Buffer (PLB) and PosMap compression techniques to reduce the performance overhead from Recursive ORAM empirically (the latter also improves the construction asymptotically). Through simulation, we show that these techniques reduce the memory bandwidth overhead needed to support recursion by 95%, reduce overall ORAM bandwidth by 37% and improve overall SPEC benchmark performance by 1.27x. We then show how our PosMap compression techniques further facilitate an extremely efficient integrity verification scheme for ORAM which we call PosMap MAC (PMMAC). For a practical parameterization, PMMAC reduces the amount of hashing needed for integrity checking by >= 68x relative to prior schemes and introduces only 7% performance overhead. We prototype our mechanisms in hardware and report area and clock frequency for a complete ORAM design post-synthesis and post-layout using an ASIC flow in a 32~nm commercial process. With 2 DRAM channels, the design post-layout runs at 1~GHz and has a total area of .47~mm2. Depending on PLB-specific parameters, the PLB accounts for 10% to 26% area. PMMAC costs 12% of total design area. Our work is the first to prototype Recursive ORAM or ORAM with any integrity scheme in hardware.
90 citations
••
TL;DR: The authors outline a synthesis procedure which beginning from a state transition graph (STG) description of a sequential machine produces an optimized fully and easily testable logic implementation which guarantees testability for both Moore and Mealy machines.
Abstract: The authors outline a synthesis procedure which beginning from a state transition graph (STG) description of a sequential machine produces an optimized fully and easily testable logic implementation. This logic-level implementation is guaranteed to be testable for all single stuck-at faults in the combinational logic and the test sequences for these faults can be obtained using combinational test generation techniques alone. The sequential machine is assumed to have a reset state and be R-reachable. All single stuck-at faults in the combinational logic and the input and output stuck-at faults of the memory elements in the synthesized logic-level automaton can be tested without access to the memory elements using these test sequences. Thus this procedure represents an alternative to a scan design methodology. The area penalty incurred due to the constraints on the optimization are small. The performance of the synthesized design is usually better than that of an unconstrained design optimized for area alone. The authors show that an intimate relationship exists between state assignment and the testability of a sequential machine. They propose a procedure of constrained state assignment and logic optimization which guarantees testability for both Moore and Mealy machines. >
90 citations
Cited by
More filters
••
TL;DR: TaintDroid as mentioned in this paper is an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data by leveraging Android's virtualized execution environment.
Abstract: Today’s smartphone operating systems frequently fail to provide users with visibility into how third-party applications collect and share their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid enables realtime analysis by leveraging Android’s virtualized execution environment. TaintDroid incurs only 32p performance overhead on a CPU-bound microbenchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, in our 2010 study we found 20 applications potentially misused users’ private information; so did a similar fraction of the tested applications in our 2012 study. Monitoring the flow of privacy-sensitive data with TaintDroid provides valuable input for smartphone users and security service firms seeking to identify misbehaving applications.
2,983 citations
••
04 Oct 2010TL;DR: Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, this work found 68 instances of misappropriation of users' location and device identification information across 20 applications.
Abstract: Today's smartphone operating systems frequently fail to provide users with adequate control over and visibility into how third-party applications use their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid provides realtime analysis by leveraging Android's virtualized execution environment. TaintDroid incurs only 14% performance overhead on a CPU-bound micro-benchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, we found 68 instances of potential misuse of users' private information across 20 applications. Monitoring sensitive data with TaintDroid provides informed use of third-party applications for phone users and valuable input for smartphone security service firms seeking to identify misbehaving applications.
2,379 citations
••
TL;DR: The OBDD data structure is described and a number of applications that have been solved by OBDd-based symbolic analysis are surveyed.
Abstract: Ordered Binary-Decision Diagrams (OBDDs) represent Boolean functions as directed acyclic graphs. They form a canonical representation, making testing of functional properties such as satisfiability and equivalence straightforward. A number of operations on Boolean functions can be implemented as graph algorithms on OBDD data structures. Using OBDDs, a wide variety of problems can be solved through symbolic analysis. First, the possible variations in system parameters and operating conditions are encoded with Boolean variables. Then the system is evaluated for all variations by a sequence of OBDD operations. Researchers have thus solved a number of problems in digital-system design, finite-state system analysis, artificial intelligence, and mathematical logic. This paper describes the OBDD data structure and surveys a number of applications that have been solved by OBDD-based symbolic analysis.
2,196 citations
••
04 Jun 2007TL;DR: This work presents PUF designs that exploit inherent delay characteristics of wires and transistors that differ from chip to chip, and describes how PUFs can enable low-cost authentication of individual ICs and generate volatile secret keys for cryptographic operations.
Abstract: Physical Unclonable Functions (PUFs) are innovative circuit primitives that extract secrets from physical characteristics of integrated circuits (ICs). We present PUF designs that exploit inherent delay characteristics of wires and transistors that differ from chip to chip, and describe how PUFs can enable low-cost authentication of individual ICs and generate volatile secret keys for cryptographic operations.
2,014 citations
•
01 Jan 2007
1,944 citations