Home
/
Authors
/
J. Brandon Dwiel

Author

J. Brandon Dwiel

Bio: J. Brandon Dwiel is an academic researcher from North Carolina State University. The author has contributed to research in topics: Computing with Memory & SIMD. The author has an hindex of 2, co-authored 2 publications receiving 7 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Computing in 3D

[...]

Paul D. Franzon¹, Eric Rotenberg¹, James Tuck¹, W. Rhett Davis¹, Huiyang Zhou¹, Joshua Schabel¹, Zhenquian Zhang¹, J. Brandon Dwiel¹, Elliott Forbes¹, Joonmoo Huh¹, Steve Lipa¹ - Show less +7 more•Institutions (1)

North Carolina State University¹

30 Nov 2015

TL;DR: 3D technologies offer significant potential to improve total performance and performance per unit of power, and the next frontier is to create sophisticated logic on logic solutions that promise further increases in performance/power beyond those attributable to memory interfaces alone.

...read moreread less

Abstract: 3D technologies offer significant potential to improve total performance and performance per unit of power. After exploiting TSV technologies for cost reduction and increasing memory bandwidth, the next frontier is to create sophisticated logic on logic solutions that promise further increases in performance/power beyond those attributable to memory interfaces alone. These include heterogeneous integration for computing and exploitation of the high amounts of 3D interconnect available to reduce total interconnect power. Challenges include access for prototype quantities and the design of sophisticated static and dynamic thermal management methods and technologies, as well as test.

...read moreread less

4 citations

Proceedings Article•DOI•

Computing in 3D

[...]

Paul D. Franzon¹, Eric Rotenberg¹, W. Rhett Davis¹, James Tuck¹, Huiyang Zhou¹, Joshua Schabel¹, Zhenquian Zhang¹, J. Brandon Dwiel¹, Elliott Forbes¹, Joonmoo Huh¹, Marcus Tshibangu¹, Steve Lipa¹ - Show less +8 more•Institutions (1)

North Carolina State University¹

23 Nov 2015

TL;DR: The concept of Fast Thread Migration using 3DIC technologies is introduced and the design of a power optimized SIMD unit in which over half of the power is employed in the FP units is presented.

...read moreread less

Abstract: 3DIC technology refers to stacking and interconnecting chips and substrates (“interposers”) with Through Silicon Vias (TSVs). Industry is gearing up for widespread introduction of this technology with the 22 nm node. We have been pursuing a range of approaches to enable low power computing. As well as 3DIC these include heterogeneous computing, powered optimized SIMD units, optimized memory hierarchies, and MPI with post-silicon customized interconnect. Heterogeneous computing refers to the concept of building a mix of CPUs and memories that in turn enable in-situ tuning of the compute load to the compute resources. We introduce the concept of Fast Thread Migration using 3DIC technologies. We present the design of a power optimized SIMD unit in which over half of the power is employed in the FP units. A parallel computer is built using an MPI paradigm. Codes are analyzed so that the MPI interconnect can be power optimized post-silicon. Emerging 3D memories have potential to be employed as Level 2 and Level 3 caches, and this is explored using the Tezzaron 3D memory. As scaling and power optimization occurs, the main memory increasingly dominates the power consumption. Possible extensions to Cortical Processing are discussed.

...read moreread less

3 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Characterization of Fine Pitch Hybrid Bonding Pads using Electrical Misalignment Test Vehicle

[...]

Imed Jani, Didier Lattard, Pascal Vivet, Lucile Arnaud, Severine Cheramy, Edith Beigne, Alexis Farcy¹, Joris Jourdon¹, Yann Henrion¹, E. Deloffre¹, Halim Bilgen¹ - Show less +7 more•Institutions (1)

STMicroelectronics¹

28 May 2019

TL;DR: In this paper, a misalignment test structure was fabricated in a Wafer-to-Wafer (W2W) assembly configuration with a pitch of 3.42µm and 1.44 µm using a very small measurement step for an accurate mis alignment measurement (respectively 45nm and 22nm).

...read moreread less

Abstract: Cu/oxide Hybrid Bonding (HB) technology is currently the ultimate fine pitch 3D interconnect solution to reach submicron pitches. It's an attractive technique to address the needs of several applications such as smart imagers, high-performance computing and memory-on-logic folding. But test and characterization of such fine-grained 3D interconnect is still an open issue; Cu-Cu interconnects are prone to many structural defects due to fabrication process, such as misalignment, which needs to be thoroughly tested to ensure the performance of 3D-ICs. In this work, we focus on testing and characterizing, on-wafer, misalignment defect induced at the bonding step. A misalignment test structure was fabricated in a Wafer-to-Wafer (W2W) assembly configuration with a pitch of 3.42µm and 1.44µm using a very small measurement step for an accurate misalignment measurement (respectively 45nm and 22nm). Electrical tests have been performed using five multi-pitch wafers with 71 measurements points per wafer. The experimental results show that the results of the proposed test structure are aligned with conventional overlay measurements. Finally, the impact of misalignment defect on resistance and capacitance parameters was demonstrated.

...read moreread less

15 citations

Book Chapter•DOI•

3D Packaging Architectures and Assembly Process Design

[...]

Ravi Mahajan¹, Bob Sankman¹•Institutions (1)

Intel¹

01 Jan 2017

TL;DR: The advantages and limitations of 3D architectures are discussed to provide context for why 3D stacking has become a key area of interest for product architects, why it has generated broad industry attention, and why its adoption has been tenous.

...read moreread less

Abstract: In this chapter, the advantages and limitations of 3D architectures are discussed to provide context for why 3D stacking has become a key area of interest for product architects, why it has generated broad industry attention, and why its adoption has been tenous. The primary focus of this chapter is on 3D architectures that use Through Silicon Vias (TSVs), while other System In Package (SIP) architectures that do not rely on TSVs are discussed for completeness. The key elements of a TSV-based 3D architecture are described, followed by a description of the three methods of manufacturing wafers with TSVs (i.e., Via-First, Via-Middle, and Via-Last). An analysis of the different assembly process flows for 3D structures, broadly classified as (a) Wafer-to-Wafer (W2W), (b) Die-to-Wafer (D2W), and (c) Die-to-Die (D2D) assembly processes, is covered. Key design, assembly process, test process, and materials considerations for each of these flows are described. The chapter concludes with a discussion of current and anticipated challenges for 3D architectures.

...read moreread less

11 citations

Proceedings Article•DOI•

Recent Advances in On-Chip Cooling Systems: Experimental Evaluation and Dynamic Modeling

[...]

Nicolas Lamaison¹, Jackson B. Marcinichen¹, John R. Thome¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Jan 2014

5 citations

Book•

Compact and Fast Machine Learning Accelerator for IoT Devices

[...]

Hantao Huang, Hao Yu

07 Dec 2018

TL;DR: A 3D multi-layer CMOSRRAM accelerator architecture for incremental machine learning is proposed, utilizing an incremental least-squares solver to perform fast learning on the neural network with significant speed-up and energyefficiency improvement.

...read moreread less

Abstract: The Internet of things (IoT) is the networked interconnection of every object to provide intelligent service and improve economy benefit. The potential of IoT and its ubiquitous computation reality are staggering, but limited by many technical challenges. One challenge is to have a real-time response to the dynamic ambient change. Machine learning accelerator on IoT edge devices is one potential solution since a centralized system suffers long latency of processing in the back end. However, IoT edge devices are resource-constrained and machine learning algorithms are computational intensive. Therefore, optimized machine learning algorithms, such as compact machine learning for less memory usage on IoT devices, is greatly needed. In this thesis, we explore the development of fast and compact machine learning accelerators by developing leastsquares solver, tensor-solver and distributed-solver. Moreover, applications such as energy management system using such machine learning solver on IoT devices are also investigated. From the fast machine learning perspective, the target is to perform fast learning on the neural network. This thesis proposes a least-squares-solver for single hidden layer neural network. Furthermore, this thesis explores the CMOS FPGA based hardware accelerator and RRAM based hardware accelerator. A 3D multi-layer CMOSRRAM accelerator architecture for incremental machine learning is proposed. By utilizing an incremental least-squares solver, the whole training process can be mapped on the 3D multi-layer CMOS-RRAM accelerator with significant speed-up and energyefficiency improvement. Experimental results using the benchmark CIFAR-10 show that the proposed accelerator has 2.05× speed-up, 12.38× energy-saving and 1.28× areasaving compared to 3D-CMOS-ASIC hardware implementation; and 14.94× speed-up, 447.17× energy-saving and around 164.38× area-saving compared to CPU software implementation. Compared to GPU implementation, our work shows 3.07× speed-up and 162.86× energy-saving. In addition, a CMOS based FPGA realization of neural network with square-root-free Cholesky factorization is also investigated for training and inference. Experimental results have shown that our proposed accelerator on Xilinx Virtex-7 has a comparable accuracy with an average speed-up of 4.56× and 89.05×,

...read moreread less

4 citations

Proceedings Article•DOI•

A 3D multi-layer CMOS-RRAM accelerator for neural network

[...]

Hantao Huang¹, Leibin Ni¹, Yuhao Wang¹, Hao Yu¹, Zongwei Wangl², Yimao Cail², Ru Huangl² - Show less +3 more•Institutions (2)

Nanyang Technological University¹, Peking University²

01 Nov 2016

TL;DR: A 3D multilayer CMOS-RRAM accelerator for an incremental least-squares based learning on neural network and results have shown that such a 3D accelerator can significantly reduce training time with acceptable accuracy.

...read moreread less

Abstract: Incremental machine learning is required for future real-time data analytics. This paper introduces a 3D multilayer CMOS-RRAM accelerator for an incremental least-squares based learning on neural network. Given input of buffered data hold on the layer of a RRAM memory, intensive matrix-vector multiplication can be firstly accelerated on the layer of a digitized RRAM-crossbar. The remaining incremental leastsquares algorithmic operations for feature extraction and classifier training can be accelerated on the layer of CMOS ASIC, using an incremental Cholesky factorization accelerator realized with consideration of parallelism and pipeline. Experiment results have shown that such a 3D accelerator can significantly reduce training time with acceptable accuracy. Compared to 3D-CMOS-ASIC implementation, it can achieve 1.28x smaller area, 2.05x faster runtime and 12.4x energy reduction. Compared to GPU implementation, our work shows 3.07x speed-up and 162.86x energy-saving.

...read moreread less

1 citations