scispace - formally typeset
Search or ask a question

Showing papers by "Ken Mai published in 2010"


Proceedings ArticleDOI
04 Dec 2010
TL;DR: In this article, the relative merits between different approaches in the face of technology constraints are analyzed for U-cores and the predictive power of their model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs.
Abstract: To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cores) such as custom logic, FPGAs, or GPGPUs. Although U-cores are effective at increasing performance, their benefits can also diminish given the scarcity of projected bandwidth in the future. To understand the relative merits between different approaches in the face of technology constraints, this work builds on prior modeling of heterogeneous multicores to support U-cores. Unlike prior models that trade performance, power, and area using well-known relationships between simple and complex processors, our model must consider the less-obvious relationships between conventional processors and a diverse set of U-cores. Further, our model supports speculation of future designs from scaling trends predicted by the ITRS road map. The predictive power of our model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today's state-of-the-art multicores, GPUs, FPGAs, and ASICs. Our results reinforce some current-day understandings of the potential and limitations of U-cores and also provides new insights on their relative merits.

249 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: A sense amplifier based PUF (SA-PUF) structure that generates random bits with increased reliability, resulting in significantly fewer errors in response bits is proposed, which eliminates the need of complex and costly ECC circuitry in cryptographic applications.
Abstract: Physically Unclonable Functions (PUFs) implement die specific random functions that offer a promising mechanism in various security applications. Stability or reliability of a PUF response is a key concern, especially when the IC containing the PUF is subjected to severe environmental variations. In cryptographic applications, errors in response bits need to be completely corrected and this is often done using costly error correction codes (ECC). In identification and authentication applications however, a complete correction of response bits is not necessary and hence costly ECC schemes can be avoided. On the flip side, a response with faulty bits cannot be post-conditioned by one-way functions, resulting in an increased vulnerability to modeling attacks. We propose a sense amplifier based PUF (SA-PUF) structure that generates random bits with increased reliability, resulting in significantly fewer errors in response bits. This eliminates the need of complex and costly ECC circuitry in cryptographic applications. Further, with the reduced cost of ECC implementation, the use of one-way functions to post-condition the outputs becomes more viable even in identification and authentication applications, thereby increasing their resilience to modeling based attacks. Finally, SA-PUF elements are inherently more resilient to environmental changes as compared to most of the earlier proposed silicon based PUF structures. Simulation data in 65nm bulk CMOS industrial process show that SA-based PUFs have 2.5x-3.5x lower errors compared to other PUF implementations when subjected to similar environmental variations.

57 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: A custom ROM-based S-Box implementation that can achieve comparable throughput to logic-based implementations, yet is smaller in both area and power, which is key in defending against power analysis side-channel attacks.
Abstract: In the AES algorithm, the Substitution Box (S-Box) often dominates the area and delay of implementations. The S-Box performs a byte-wise substitution on the data based on an established code book, and most AES algorithm implementations use a large complex logic block consisting mainly of XORs to implement the S-Box. Direct implementation of the S-Box with a look-up table (LUT) has been eschewed due to difficulty in pipelining the structure, hence restricting the throughput. However, we present a custom ROM-based S-Box implementation that can achieve comparable throughput to logic-based implementations, yet is smaller in both area and power. Additionally, the symmetrical nature of the ROM is well suited towards achieving data-independent power dissipation, which is key in defending against power analysis side-channel attacks. We present both power-analysis hardened and unhardened ROM-based S-Box designs which significantly outperform logic-based designs in area, power, performance, and power-analysis resistance.

13 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: Virtual Prototyper (ViPro) is introduced, a tool that helps SRAM designers explore the large design space by rapidly generating optimized virtual prototypes of complete SRAM macros by incorporating them into a hierarchical model that captures circuit and architectural features of the SRAM to optimize a complete prototype.
Abstract: SRAM design in scaled technologies requires knowledge of phenomena at the process, circuit, and architecture level. Decisions made at various levels of the design hierarchy affect the global figures of merit (FoMs) of an SRAM, such as, performance, power, area, and yield. However, the lack of a quick mechanism to understand the impact of changes at various levels of the hierarchy on global FoMs makes an accurate assessment of SRAM design innovations difficult. Thus, we introduce Virtual Prototyper (ViPro), a tool that helps SRAM designers explore the large design space by rapidly generating optimized virtual prototypes of complete SRAM macros. It does so by allowing designers to describe the SRAM components with varying levels of detail and by incorporating them into a hierarchical model that captures circuit and architectural features of the SRAM to optimize a complete prototype. It generates base-case prototypes that provide starting points for design space exploration, and assesses the impact of process, circuit, and architectural changes on the overall SRAM macro design.

10 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper compares four promising power-analysis-resistant digital logic styles against a standard CMOS baseline and presents a clearer picture of the advantages and drawbacks of each.
Abstract: Power analysis attacks are a common and effective method of defeating cryptographic systems. Many power-analysis-resistant digital circuit techniques have been previously proposed, leaving the circuit designer a myriad of choices without a simple way to compare and contrast the strengths and weaknesses of each technique. In this paper, we compare four promising power-analysis-resistant digital logic styles against a standard CMOS baseline. By comparing these techniques side by side in a consistent manner we present a clearer picture of the advantages and drawbacks of each. Results are presented for logic gate area, energy consumption, and power-analysis resistance. We also present a novel test structure suitable for measuring power-analysis resistance of individual logic gates in actual silicon.

5 citations