scispace - formally typeset
Search or ask a question
Author

Wenfeng Zhao

Bio: Wenfeng Zhao is an academic researcher from University of Minnesota. The author has contributed to research in topics: Compressed sensing & Data compression. The author has an hindex of 11, co-authored 34 publications receiving 391 citations. Previous affiliations of Wenfeng Zhao include Binghamton University & National University of Singapore.

Papers
More filters
Proceedings ArticleDOI
01 Jan 2015
TL;DR: In silicon PUFs, trustworthy bit generation is achieved by accentuating local process variations through various circuit principles and rejecting global process/voltage/temperature (PVT) variations, layout-dependent process variations and noise.
Abstract: Physically unclonable functions (PUFs) enable information security down to the chip level [1-4]. Arrays of PUF bitcells (Fig. 14.3.1) generate chip-specific keys that are unpredictable, repeatable and cannot be measured externally, thus uniquely identifying the die to counteract chip piracy/counterfeiting and enable lightweight authentication/encryption [1-4]. In silicon PUFs, trustworthy bit generation is achieved by accentuating local process variations through various circuit principles (e.g., delay mismatch) and rejecting global process/voltage/temperature (PVT) variations, layout-dependent process variations and noise [2].

83 citations

Journal ArticleDOI
TL;DR: A novel class of mono-stable static physically unclonable functions (PUFs) for secure key generation and chip identification is proposed and is demonstrated through a 65 nm prototype that contains two different implementations, as well as several previously proposed PUFs to enable a fair comparison at iso-technology.
Abstract: A novel class of mono-stable static physically unclonable functions (PUFs) for secure key generation and chip identification is proposed. The fundamental concept is demonstrated through a 65 nm prototype that contains two different implementations, as well as several previously proposed PUFs to enable a fair comparison at iso-technology. From a statistical quality viewpoint, the achieved reproducibility and uniqueness are quantified by an intra-PUF Hamming distance (HD) lower than 1 and an inter-PUF HD of 128.35, for a 256-bit PUF output key. The keys generated by the proposed PUF pass all applicable NIST randomness tests. The measured energy per bit is as low as 15 fJ/bit. Native unstable bits are less than 2% at nominal conditions, less than 5% at 0.6–1 V and less than 6% in worst case scenario of 0.6 V voltage and 85 °C temperature, before applying any further post-silicon technique for stability enhancement.

63 citations

Journal ArticleDOI
TL;DR: This brief introduces a novel LS circuit with NMOS-diode-based current limiter for current contention reduction to achieve robust and efficient level conversion and explores the inverse narrow width effect to increase the drivability of the pull-down devices for delay reduction.
Abstract: Level shifters (LS) are crucial interface circuits for multisupply voltage designs, and it is challenging to achieve both robust and efficient level conversion from subthreshold to aforementioned threshold. In this brief, we propose two circuit techniques for a novel subthreshold LS with wide conversion range. First, we introduce a novel LS circuit with NMOS-diode-based current limiter for current contention reduction to achieve robust and efficient level conversion. Second, we explore the inverse narrow width effect to increase the drivability of the pull-down devices for delay reduction. When implemented in a commercial 65-nm MTCMOS process, the proposed LS achieves robust conversion from deep subthreshold (sub-100 mV) to nominal supply voltage (1.2 V). For the target conversion from 0.3 to 1.2 V, the proposed LS shows on average 25.1-ns propagation delay, 30.7-fJ energy efficiency, and 2.5-nW leakage power across 25 test chips.

59 citations

Journal ArticleDOI
TL;DR: It is demonstrated that the proposed CS encoders lead to comparable recovery performance and efficient VLSI architecture designs are proposed for QCAC-CS and $(1,s)$-SRBM encoder designs with reduced area and total power consumption.
Abstract: On-chip neural data compression is an enabling technique for wireless neural interfaces that suffer from insufficient bandwidth and power budgets to transmit the raw data. The data compression algorithm and its implementation should be power and area efficient and functionally reliable over different datasets. Compressed sensing is an emerging technique that has been applied to compress various neurophysiological data. However, the state-of-the-art compressed sensing (CS) encoders leverage random but dense binary measurement matrices, which incur substantial implementation costs on both power and area that could offset the benefits from the reduced wireless data rate. In this paper, we propose two CS encoder designs based on sparse measurement matrices that could lead to efficient hardware implementation. Specifically, two different approaches for the construction of sparse measurement matrices, i.e., the deterministic quasi-cyclic array code (QCAC) matrix and -sparse random binary matrix [-SRBM] are exploited. We demonstrate that the proposed CS encoders lead to comparable recovery performance. And efficient VLSI architecture designs are proposed for QCAC-CS and -SRBM encoders with reduced area and total power consumption.

42 citations

Journal ArticleDOI
TL;DR: This paper introduces a novel body-biasing technique to mitigate the performance loss at near-threshold voltages while not requiring any additional circuitry for the body-bias control, thereby minimizing the design effort and simplifying the systems-on-chip integration.
Abstract: Near-threshold operation enables high energy efficiency, but requires proper design techniques to deal with performance loss and increased sensitivity to process variations. In this paper, we address both issues with two synergistic approaches. First, we introduce a novel body-biasing technique to mitigate the performance loss at near-threshold voltages while not requiring any additional circuitry for the body-bias control, thereby minimizing the design effort and simplifying the systems-on-chip integration. Second, we introduce a novel statistical design methodology to efficiently and accurately evaluate the design guardband strictly needed in the worst case, thereby keeping the area cost of variations at its very minimum. A 65-nm advanced encryption standard testchip demonstrates $1.65\times $ throughput improvement over a baseline design without body biasing, and enables reliable operation over a wide voltage range (0.5–1.2 V) as opposed to traditional body-biasing schemes. In addition, our testchip achieves $1.63\times $ area efficiency improvement compared with a design based on corner analysis. Accordingly, the proposed techniques are well suited for the design of near-threshold specialized hardware with improved performance, reduced silicon area, and design effort.

37 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: A thorough evaluation on five latest types of modern GPU interconnects from six high-end servers and HPC platforms shows that, for an application running in a multi-GPU node, choosing the right GPU combination can impose considerable impact on GPU communication efficiency, as well as the application's overall performance.
Abstract: High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale simulations. However, the lack of deep understanding on how modern GPUs can be connected and the real impact of state-of-the-art interconnect technology on multi-GPU application performance become a hurdle. In this paper, we fill the gap by conducting a thorough evaluation on five latest types of modern GPU interconnects: PCIe, NVLink-V1, NVLink-V2, NVLink-SLI and NVSwitch, from six high-end servers and HPC platforms: NVIDIA P100-DGX-1, V100-DGX-1, DGX-2, OLCF's SummitDev and Summit supercomputers, as well as an SLI-linked system with two NVIDIA Turing RTX-2080 GPUs. Based on the empirical evaluation, we have observed four new types of GPU communication network NUMA effects: three are triggered by NVLink's topology, connectivity and routing, while one is caused by PCIe chipset design issue. These observations indicate that, for an application running in a multi-GPU node, choosing the right GPU combination can impose considerable impact on GPU communication efficiency, as well as the application's overall performance. Our evaluation can be leveraged in building practical multi-GPU performance models, which are vital for GPU task allocation, scheduling and migration in a shared environment (e.g., AI cloud and HPC centers), as well as communication-oriented performance tuning.

118 citations

Proceedings ArticleDOI
Bohdan Karpinskyy1, Yong Ki Lee1, Choi Yunhyeok1, Yong-Soo Kim1, Mi-Jung Noh1, Sanghyun Lee1 
25 Feb 2016
TL;DR: A PUF structure based on the threshold voltage (Vth) difference of inverting logic gates is presented, which is implemented for secure 24b key generation in a 45nm smart card chip and achieves an error rate as low as 2.01×10-38.
Abstract: Physically unclonable function (PUF) circuits are for generating unique secure keys or chip IDs based on intrinsic properties of each chip itself [1–2]. PUFs are a step forward to improve the security level compared to traditional NVM (non-volatile memory) solutions (FUSEs, EEPROM/FLASH, etc.) because they resolve security issues, such as active data-probing, transferring the security key from outside, etc. Since the MOSFET mismatch (e.g. size, doping concentration, mobility and oxide thickness) due to process variations cannot be fully controlled, PUFs, which are based on such phenomena, cannot be replicated. Unfortunately, the PUF output is erroneous by nature, as caused by thermal noise, voltage/temperature influence, aging effects, etc. The stability issue must be overcome since standard security applications, such as data encryption and digital signatures, have zero error-tolerance. In this work, a PUF structure based on the threshold voltage (Vth) difference of inverting logic gates is presented, which is implemented for secure 24b key generation in a 45nm smart card chip. The key is used as part of an encryption key and achieves an error rate as low as 2.01×10−38. The PUF system is also scalable for a larger key size.

98 citations