scispace - formally typeset
Search or ask a question
Author

Srinivas Devadas

Bio: Srinivas Devadas is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Sequential logic & Combinational logic. The author has an hindex of 88, co-authored 480 publications receiving 31897 citations. Previous affiliations of Srinivas Devadas include University of California, Berkeley & Cornell University.


Papers
More filters
Dissertation
01 Jan 2012
TL;DR: This thesis uses the dPool data structure within the parallel fos Physical Memory Allocation fleet and demonstrates that it is possible to use a dPool to manage shared state in a factored operating system's physical page allocator.
Abstract: Future computer architectures will likely exhibit increased parallelism through the addition of more processor cores. Architectural trends such as exponentially increasing parallelism and the possible lack of scalable shared memory motivate the reevaluation of operating system design. This thesis work takes place in the context of Factored Operating Systems which leverage distributed system ideas to increase the scalability of multicore processor operating systems. fos, a Factored Operating System, explores a new design point for operating systems where traditional low-level operating system services are fine-grain parallelized while internally only using explicit message passing for communication. fos factors an operating system first by system service and then further parallelizes inside of the system service by splitting the service into a fleet of server processes which communicate via messaging. Constructing parallel low-level operating system services which only internally use messaging is challenging because shared resources must be partitioned across servers and the services must provide scalable performance when met with uneven demand. To ease the construction of parallel fos system services, this thesis develops the dPool distributed data structure. The dPool data structure provides concurrent access to an unordered collection of elements by server processes within a fos fleet. Internal to a single dPool instance, all communication between different portions of a dPool is done via messaging. This thesis uses the dPool data structure within the parallel fos Physical Memory Allocation fleet and demonstrates that it is possible to use a dPool to manage shared state in a factored operating system's physical page allocator. This thesis begins by presenting the design of the prototype fos operating system. In the context of fos system service fleets, this thesis describes the dPool data structure, its design, different implementations, and interfaces. The dPool data structure is shown to achieve scalability across even and uneven micro-benchmark workloads. This thesis shows that common parallel and distributed programming techniques apply to the creation of dPool and that background threads within a dPool can increase performance. Finally, this thesis evaluates different dPool implementations and demonstrates that intelligently pushing elements between dPool parts can increase scalability. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

2 citations

Proceedings ArticleDOI
30 Mar 2010
TL;DR: This paper presents a pattern for developing parallel programs using systolic designs to execute efficiently (without resorting to simulation) on modern multicore processors featuring scalar operand networks and illustrates the application of this pattern to produce parallel implementations of matrix multiplication and convolution.
Abstract: Systolic arrays have long been used to develop custom hardware because they result in designs that are efficient and scalable. Many researchers have explored ways to exploit systolic designs in programmable processors; however, such efforts often result in the simulation of large systolic arrays on a general purpose platforms. While simulation can add flexibility and problem size independence, it comes at a cost of greatly reducing the efficiency of the original systolic approach. This paper presents a pattern for developing parallel programs using systolic designs to execute efficiently (without resorting to simulation) on modern multicore processors featuring scalar operand networks. This pattern provides a compromise solution that can achieve high efficiency and flexibility given appropriate hardware support. Several examples illustrate the application of this pattern to produce parallel implementations of matrix multiplication and convolution.

2 citations

22 May 2012
TL;DR: This paper develops a scalable, efficient shared memory architecture that enables seamless adaptation between private and logically shared caching at the fine granularity of cache lines and allows private caching for data blocks with high spatio-temporal locality.
Abstract: As transistor density continues to grow geometrically, processor manufacturers are already able to place a hundred cores on a chip (e.g., Tilera TILE-Gx 100), with massive multicore chips on the horizon. Programmers now need to invest more effort in designing software capable of exploiting multicore parallelism. The shared memory paradigm provides a convenient layer of abstraction to the programmer, but will current memory architectures scale to hundreds of cores? This paper directly addresses the question of how to enable scalable memory systems for future multicores. We develop a scalable, efficient shared memory architecture that enables seamless adaptation between private and logically shared caching at the fine granularity of cache lines. Our datacentric approach relies on in-hardware runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit on-chip cache capacity and enable low-latency memory access in large-scale multicores.

2 citations

Dissertation
01 Jan 2008
TL;DR: The results show that the proposed cache mechanisms show promise in improving cache performance and predictability with a modest increase in silicon area.
Abstract: Embedded systems are increasingly using on-chip caches as part of their on-chip memory system This thesis presents cache mechanisms to improve cache performance and provide opportunities to improve data availability that can lead to more predictable cache performance The first cache mechanism presented is an intelligent cache replacement policy that utilizes information about dead data and data that is very frequently used This mechanism is analyzed theoretically to show that the number of misses using intelligent cache replacement is guaranteed to be no more than the number of misses using traditional LRU replacement Hardware and software-assisted mechanisms to implement intelligent cache replacement are presented and evaluated The second cache mechanism presented is that of cache partitioning which exploits disjoint access sequences that do not overlap in the memory space A theoretical result is proven that shows that modifying an access sequence into a concatenation of disjoint access sequences is guaranteed to improve the cache hit rate Partitioning mechanisms inspired by the concept of disjoint sequences are designed and evaluated A profile-based analysis, annotation, and simulation framework has been implemented to evaluate the cache mechanisms This framework takes a compiled benchmark program and a set of program inputs and evaluates various cache mechanisms to provide a range of possible performance improvement scenarios The proposed cache mechanisms have been evaluated using this framework by measuring cache miss rates and Instructions Per Clock (IPC) information The results show that the proposed cache mechanisms show promise in improving cache performance and predictability with a modest increase in silicon area (Copies available exclusively from MIT Libraries, Rm 14-0551, Cambridge, MA 02139-4307 Ph 617-253-5668; Fax 617-253-1690)

2 citations

Proceedings ArticleDOI
01 May 1990
TL;DR: Methods for the synthesis of sequential circuits for high multiple fault testability are proposed and it is shown that the effects of multiple stuck-at faults on the state graph of a sequential circuit can be much more dramatic than the effect of single stuck- at faults.
Abstract: The effects of multiple stuck-at faults on sequential circuits are analyzed. It is shown that the effects of multiple stuck-at faults on the state graph of a sequential circuit can be much more dramatic than the effects of single stuck-at faults. Methods for the synthesis of sequential circuits for high multiple fault testability are proposed. >

2 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: TaintDroid as mentioned in this paper is an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data by leveraging Android's virtualized execution environment.
Abstract: Today’s smartphone operating systems frequently fail to provide users with visibility into how third-party applications collect and share their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid enables realtime analysis by leveraging Android’s virtualized execution environment. TaintDroid incurs only 32p performance overhead on a CPU-bound microbenchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, in our 2010 study we found 20 applications potentially misused users’ private information; so did a similar fraction of the tested applications in our 2012 study. Monitoring the flow of privacy-sensitive data with TaintDroid provides valuable input for smartphone users and security service firms seeking to identify misbehaving applications.

2,983 citations

Proceedings ArticleDOI
04 Oct 2010
TL;DR: Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, this work found 68 instances of misappropriation of users' location and device identification information across 20 applications.
Abstract: Today's smartphone operating systems frequently fail to provide users with adequate control over and visibility into how third-party applications use their private data. We address these shortcomings with TaintDroid, an efficient, system-wide dynamic taint tracking and analysis system capable of simultaneously tracking multiple sources of sensitive data. TaintDroid provides realtime analysis by leveraging Android's virtualized execution environment. TaintDroid incurs only 14% performance overhead on a CPU-bound micro-benchmark and imposes negligible overhead on interactive third-party applications. Using TaintDroid to monitor the behavior of 30 popular third-party Android applications, we found 68 instances of potential misuse of users' private information across 20 applications. Monitoring sensitive data with TaintDroid provides informed use of third-party applications for phone users and valuable input for smartphone security service firms seeking to identify misbehaving applications.

2,379 citations

Journal ArticleDOI
TL;DR: The OBDD data structure is described and a number of applications that have been solved by OBDd-based symbolic analysis are surveyed.
Abstract: Ordered Binary-Decision Diagrams (OBDDs) represent Boolean functions as directed acyclic graphs. They form a canonical representation, making testing of functional properties such as satisfiability and equivalence straightforward. A number of operations on Boolean functions can be implemented as graph algorithms on OBDD data structures. Using OBDDs, a wide variety of problems can be solved through symbolic analysis. First, the possible variations in system parameters and operating conditions are encoded with Boolean variables. Then the system is evaluated for all variations by a sequence of OBDD operations. Researchers have thus solved a number of problems in digital-system design, finite-state system analysis, artificial intelligence, and mathematical logic. This paper describes the OBDD data structure and surveys a number of applications that have been solved by OBDD-based symbolic analysis.

2,196 citations

Proceedings ArticleDOI
04 Jun 2007
TL;DR: This work presents PUF designs that exploit inherent delay characteristics of wires and transistors that differ from chip to chip, and describes how PUFs can enable low-cost authentication of individual ICs and generate volatile secret keys for cryptographic operations.
Abstract: Physical Unclonable Functions (PUFs) are innovative circuit primitives that extract secrets from physical characteristics of integrated circuits (ICs). We present PUF designs that exploit inherent delay characteristics of wires and transistors that differ from chip to chip, and describe how PUFs can enable low-cost authentication of individual ICs and generate volatile secret keys for cryptographic operations.

2,014 citations

Proceedings Article
01 Jan 2007

1,944 citations