Showing papers by "Srinivas Devadas published in 2021"

PDF

Open Access

Proceedings Article•DOI•

F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption

[...]

Nikola Samardzic¹, Axel Feldmann¹, Aleksandar Krastev¹, Srinivas Devadas¹, Ronald G. Dreslinski², Chris Peikert², Daniel Sanchez¹ - Show less +3 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Michigan²

18 Oct 2021

TL;DR: F1 as discussed by the authors is the first FHE accelerator that is programmable, i.e., capable of executing full FHE programs, based on an in-depth architectural analysis of the characteristics of FHE computations that reveals acceleration opportunities.

...read moreread less

Abstract: Fully Homomorphic Encryption (FHE) allows computing on encrypted data, enabling secure offloading of computation to untrusted servers. Though it provides ideal security, FHE is expensive when executed in software, 4 to 5 orders of magnitude slower than computing on unencrypted data. These overheads are a major barrier to FHE’s widespread adoption. We present F1, the first FHE accelerator that is programmable, i.e., capable of executing full FHE programs. F1 builds on an in-depth architectural analysis of the characteristics of FHE computations that reveals acceleration opportunities. F1 is a wide-vector processor with novel functional units deeply specialized to FHE primitives, such as modular arithmetic, number-theoretic transforms, and structured permutations. This organization provides so much compute throughput that data movement becomes the key bottleneck. Thus, F1 is primarily designed to minimize data movement. Hardware provides an explicitly managed memory hierarchy and mechanisms to decouple data movement from execution. A novel compiler leverages these mechanisms to maximize reuse and schedule off-chip and on-chip data movement. We evaluate F1 using cycle-accurate simulation and RTL synthesis. F1 is the first system to accelerate complete FHE programs, and outperforms state-of-the-art software implementations by gmean 5,400 × and by up to 17,000 ×. These speedups counter most of FHE’s overheads and enable new applications, like real-time private deep learning in the cloud.

...read moreread less

98 citations

Journal Article•DOI•

Accelerating Robot Dynamics Gradients on a CPU, GPU, and FPGA

[...]

Brian Plancher¹, Sabrina M. Neuman¹, Thomas Bourgeat², Scott Kuindersma¹, Srinivas Devadas², Vijay Janapa Reddi¹ - Show less +2 more•Institutions (2)

Harvard University¹, Massachusetts Institute of Technology²

08 Feb 2021

TL;DR: In this paper, the gradient of rigid body dynamics is computed on a CPU, GPU, and FPGA, and the authors show that the relative performance across hardware platforms depends on the number of parallel gradient evaluations required.

...read moreread less

Abstract: Computing the gradient of rigid body dynamics is a central operation in many state-of-the-art planning and control algorithms in robotics Parallel computing platforms such as GPUs and FPGAs can offer performance gains for algorithms with hardware-compatible computational structures In this letter, we detail the designs of three faster than state-of-the-art implementations of the gradient of rigid body dynamics on a CPU, GPU, and FPGA Our optimized FPGA and GPU implementations provide as much as a 30x end-to-end speedup over our optimized CPU implementation by refactoring the algorithm to exploit its computational features, eg, parallelism at different granularities We also find that the relative performance across hardware platforms depends on the number of parallel gradient evaluations required

...read moreread less

22 citations

Proceedings Article•DOI•

Robomorphic computing: a design methodology for domain-specific accelerators parameterized by robot morphology

[...]

Sabrina M. Neuman¹, Brian Plancher¹, Thomas Bourgeat², Thierry Tambe¹, Srinivas Devadas², Vijay Janapa Reddi¹ - Show less +2 more•Institutions (2)

Harvard University¹, Massachusetts Institute of Technology²

19 Apr 2021

TL;DR: In this article, a methodology to transform robot morphology into a customized hardware accelerator morphology is presented, using robot topology and structure to exploit parallelism and matrix sparsity patterns in accelerator hardware.

...read moreread less

Abstract: Robotics applications have hard time constraints and heavy computational burdens that can greatly benefit from domain-specific hardware accelerators. For the latency-critical problem of robot motion planning and control, there exists a performance gap of at least an order of magnitude between joint actuator response rates and state-of-the-art software solutions. Hardware acceleration can close this gap, but it is essential to define automated hardware design flows to keep the design process agile as applications and robot platforms evolve. To address this challenge, we introduce robomorphic computing: a methodology to transform robot morphology into a customized hardware accelerator morphology. We (i) present this design methodology, using robot topology and structure to exploit parallelism and matrix sparsity patterns in accelerator hardware; (ii) use the methodology to generate a parameterized accelerator design for the gradient of rigid body dynamics, a key kernel in motion planning; (iii) evaluate FPGA and synthesized ASIC implementations of this accelerator for an industrial manipulator robot; and (iv) describe how the design can be automatically customized for other robot models. Our FPGA accelerator achieves speedups of 8× and 86× over CPU and GPU when executing a single dynamics gradient computation. It maintains speedups of 1.9× to 2.9× over CPU and GPU, including computation and I/O round-trip latency, when deployed as a coprocessor to a host CPU for processing multiple dynamics gradient computations. ASIC synthesis indicates an additional 7.2× speedup for single computation latency. We describe how this principled approach generalizes to more complex robot platforms, such as quadrupeds and humanoids, as well as to other computational kernels in robotics, outlining a path forward for future robomorphic computing accelerators.

...read moreread less

20 citations

Posted Content•

AdVeil: A Private Targeted-Advertising Ecosystem.

[...]

Sacha Servan-Schreiber, Kyle Hogan, Srinivas Devadas

01 Jan 2021-IACR Cryptology ePrint Archive

1 citations

Posted Content•

Towards Understanding Practical Randomness Beyond Noise: Differential Privacy and Mixup.

[...]

Hanshen Xiao, Srinivas Devadas