scispace - formally typeset
Search or ask a question
Author

Dawei Li

Bio: Dawei Li is an academic researcher from Northwestern University. The author has contributed to research in topics: Chip & Content-addressable memory. The author has an hindex of 4, co-authored 9 publications receiving 36 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper characterize the interplay between power consumption and performance of a matchline-based Content Addressable Memory and then proposes the use of a multi-Vdd design to save power and increase post-fabrication tunability.
Abstract: In this paper, we characterize the interplay between power consumption and performance of a matchline-based Content Addressable Memory and then propose the use of a multi-Vdd design to save power and increase post-fabrication tunability Exploration of the power consumption behavior of a CAM chip shows the drastically different behavior among the components and suggests the use of different and independent power supplies The complete design, simulation and testing of a multi-Vdd CAM chip along with an exploration of the multi-Vdd design space are presented Our analysis has been applied to simulated models on two different technology nodes (130 nm and 45 nm), followed by experiments on a 246-kb test chip fabricated in 130 nm Global Foundries Low Power CMOS technology The proposed design, operating at an optimal operating point in a triple-Vdd configuration, increases the power-delay operation range by 24 times and consumes 253% less dynamic power when compared to a conventional single-Vdd design operating over the same voltage range with equivalent noise margin Our multi-Vdd design also helps save 513% standby power Measurement results from the test chip combined with the simulation analysis at the two nodes validate our thesis

13 citations

Proceedings ArticleDOI
07 Nov 2013
TL;DR: This paper presents a novel architecture for embedding bi-metallic thermocouple based temperature sensors into 3D IC stacks and proposes a low cost solution by leveraging a fraction of existing thermal TSVs for this purpose.
Abstract: In this paper, we present a novel architecture for embedding bi-metallic thermocouple based temperature sensors into 3D IC stacks. To the best of our knowledge this is the first work addressing this specific integration problem. Our architecture uses dedicated vias to thermally couple sensors in the metal layer with the hotspots to be monitored in the active layer throughout the multi-stack structures. We propose a low cost solution by leveraging a fraction of existing thermal TSVs for this purpose. Through thermal modeling and simulation using a state-of-the-art tool (FloTHERM), we demonstrate that we can achieve high accuracy (less than 1°C error) in temperature tracking while still maintaining the effectiveness of the thermal TSVs in heat management (conforming to a fixed peak temperature threshold of 95°C).

9 citations

Journal ArticleDOI
TL;DR: A new theorem is presented and its formal proof is presented, which is the key enabler to achieving a provably optimal solution for configuring bias current levels of TEC devices.
Abstract: We established a novel theoretical analysis framework for optimizing the cooling system configuration of chips employing thermoelectric cooling (TEC) elements by extending the theory of inverse-positive matrices and the eigenvalue/eigenvector theory in linear algebra. In this brief, we present a new theorem and its formal proof, which is the key enabler to achieving a provably optimal solution for configuring bias current levels of TEC devices.

8 citations

Journal ArticleDOI
TL;DR: This paper proposes a low-overhead design methodology by linking the sensor placement task with the existing thermal TSV planning phase for 3-D ICs, and demonstrates that it can achieve high accuracy (1 °C error) in temperature tracking while still maintaining the effectiveness of the thermal TSVs in heat management.
Abstract: Solutions to the integration challenges of a new thermal sensor technology into 3-D integrated circuits (ICs) will be discussed in this paper Our proposed architecture uses bimetallic thin-film thermocouples, which are thermally linked to points of measurement throughout the 3-D stack with dedicated vias These vias will be similar to thermal through-silicon vias (TSVs) in structure, yet different in functionality We propose a low-overhead design methodology by linking the sensor placement task with the existing thermal TSV planning phase for 3-D ICs A fraction of thermal TSV resources is decoupled from their original use and repurposed for the temperature sensing infrastructure Tradeoffs concerning the reduction of the thermal TSV resources are investigated Furthermore, we present an end-to-end system, including the physical realization of the sensor network as well as its analog interface circuitry with the sensor data sampling unit We demonstrate the operation and correctness of this interface with transistor-level simulations Next, through thermal modeling and simulation using a state-of-the-art tool (FloTHERM), we demonstrate that we can achieve high accuracy (1 °C error) in temperature tracking while still maintaining the effectiveness of the thermal TSVs in heat management (conforming to a peak temperature constraint of 95 °C)

7 citations

Proceedings ArticleDOI
18 Jun 2017
TL;DR: There is an optimal operating point where, with a reduced clock frequency, processor cores would actually recover any performance loss induced by DRAM refresh and at the same time the cache energy consumption could be optimized.
Abstract: We describe an adaptive thermal management system for 3D-ICs with stacked DRAM cache memories. We present a detailed analysis of the impact of 3D-IC hotspot aggregation on the refresh behavior of the stacked DRAM-based L3 cache. We also present the consequence of the refresh variation on the overall system performance and cache energy consumption. Our analysis demonstrates that memory intensive applications are influenced more strongly by the DRAM refresh variation. We show that there is an optimal operating point where, with a reduced clock frequency, processor cores would actually recover any performance loss induced by DRAM refresh and at the same time the cache energy consumption could be optimized. We propose a low overhead run-time method that can identify the best CPU frequency modulation factor to cool the system to minimize accelerated refresh rates in the DRAM caches. Our system can provide a customizable trade-off between performance of the processor and energy savings of the memory.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper shows that the thermal impact on 3D processors is manageable by adopting thermal-aware techniques, thus making3D processors into the mainstream in the near future.

72 citations

Journal ArticleDOI
Zheyao Wang1
TL;DR: The fundamental fabrication technologies of 3D integration are introduced, the recent progresses of MEMS and microsystems using 3D Integration and TSV technologies are reviewed, and the conclusions are made and the future trends are discussed.

45 citations

Journal ArticleDOI
TL;DR: In this paper, a scalable and parametric analysis of thermoelectric micro-coolers for hotspot cooling based on analytic formulations is presented, where the authors investigate two design cases: 1) the maximum cooling and 2) the minimum drive current for a given hotspot temperature to obtain an improved coefficient-of-performance (COP).

19 citations

Journal ArticleDOI
TL;DR: A novel adaptive DTM framework for heterogeneous multi-core processors, which utilizes the big and small cores to prevent performance degradation and improves the average performance by 8.9 percent, compared to ARM's DVFS-based IPA (Intelligent Power Allocation), satisfying thermal constraints.
Abstract: Off-the-shelf embedded systems have adopted heterogeneous multi-core processors which have high-performance big cores and low-power small cores. Though there are two different types of cores in heterogeneous multi-core processors, conventional DVFS (Dynamic Voltage and Frequency Scaling)-based DTM (Dynamic Thermal Management) techniques do not utilize the different types of cores to cool down hot cores. Rather, they primarily reduce the voltage and frequency of the hot cores, leading to performance degradation. In this article, we propose a novel adaptive DTM framework for heterogeneous multi-core processors, which utilizes the big and small cores to prevent performance degradation. Our proposed framework exploits two migration-based DTM techniques: 1) a technique (denoted as Migrationbig↔big) that migrates applications from hot big cores (big cores whose temperature is above a pre-defined threshold) to cold big cores (big cores whose temperature is below the threshold) and 2) a technique (denoted as Migrationbig↔small) that migrates all applications from the big cores to the small cores. In case of thermal emergency of the big cores, our proposed framework checks the number of cold big cores. When there exist available cold big cores, our proposed framework employs Migrationbig↔big to cool down the hot big cores while not reducing the big core frequency. On the other hand, when there does not exist any available cold big core, our proposed framework employs one between Migrationbig↔small and a DVFS-based DTM technique, which is expected to result in better performance. In our experiments on an embedded development board, our proposed framework improves the average performance by 8.9 percent, compared to ARM's DVFS-based IPA (Intelligent Power Allocation), satisfying thermal constraints. Our framework also improves the average performance by 10.4 percent, compared to a state-of-the-art predictive DVFS-based DTM technique.

18 citations

Journal ArticleDOI
TL;DR: This paper presents a power and resource efficient binary CAM architecture, Zi-CAM, which consumes less power and uses fewer resources than the available architectures of SRAM-based CAM on FPGAs.
Abstract: Content-addressable memory (CAM) is a type of associative memory, which returns the address of a given search input in one clock cycle. Many designs are available to emulate the CAM functionality inside the re-configurable hardware, field-programmable gate arrays (FPGAs), using static random-access memory (SRAM) and flip-flops. FPGA-based CAMs are becoming popular due to the rapid growth in software defined networks (SDNs), which uses CAM for packet classification. Emulated designs of CAM consume much dynamic power owing to a high amount of switching activity and computation involved in finding the address of the search key. In this paper, we present a power and resource efficient binary CAM architecture, Zi-CAM, which consumes less power and uses fewer resources than the available architectures of SRAM-based CAM on FPGAs. Zi-CAM consists of two main blocks. RAM block (RB) is activated when there is a sequence of repeating zeros in the input search word; otherwise, lookup tables (LUT) block (LB) is activated. Zi-CAM is implemented on Xilinx Virtex-6 FPGA for the size 64 × 36 which improved power consumption and hardware cost by 30 and 32%, respectively, compared to the available FPGA-based CAMs.

14 citations